Using LowerCaseStrField with search indexes
LowerCaseStrField
, which provides the following features:
-
Converts the data into lowercase and correctly stores the lowercase data in
docValues
. -
Converts the query values to lowercase.
You cannot apply |
DataStax advises against using TextField
with solr.KeywordTokenizer
and solr.LowerCaseFilterFactory
.
Unintended search results could occur because the raw data was not stored as lowercase in docValues
, contrary to expectations.
Instead, use the custom LowerCaseStrField
type as described in this topic.
For example, to use LowerCaseStrField
on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"
The command creates a search index with birthplace
using the LowerCaseStrField
field type.
The field type is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/>
...
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
To add a new field to an existing index schema with the LowerCaseStrField
field type, you can:
-
Use the ALTER SEARCH INDEX SCHEMA command in cqlsh
-
Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool get_core_schema and dsetool write_resource.
For example, in cqlsh, the following command adds the LowerCaseStrField
field type to the new field medicalNotes
if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
Remember: No matter which command you choose, using cqlsh or dsetool, be sure to RELOAD
and REBUILD
the search index in each datacenter in the cluster.
RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
There is a workaround to apply language-cql
language-cql
language-cql
language-cql
The search query is case insensitive. All queries are converted to lowercase and return the same result. For example, searches for the following values return the same result:
|