Using LowerCaseStrField with search indexes
Converts data into lowercase and correctly stores it in docValues.
DataStax Enterprise 6.0.8 introduces a custom field type,
LowerCaseStrField
, which provides the following features: - Converts the data into lowercase and correctly stores the lowercase data in
docValues
. - Converts the query values to lowercase.
Note: You cannot apply
LowerCaseStrField
to a table's primary key. You
also cannot use any analyzers with LowerCaseStrField
.DataStax advises against using TextField
with
solr.KeywordTokenizer
and solr.LowerCaseFilterFactory
.
Unintended search results could occur because the raw data was not stored as lowercase in
docValues
, contrary to expectations. Instead, use the custom
LowerCaseStrField
type as described in this topic.
For example, to use
LowerCaseStrField
on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"The command creates a search index with
birthplace
using the
LowerCaseStrField
field type. The field type is added automatically. To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> <fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/> <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/> </types> <fields> <field indexed="true" multiValued="false" name="grade_completed" type="StrField"/> ... <field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/> <field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/> ... </fields> <uniqueKey>(id,age)</uniqueKey> </schema>To add a new field to an existing index schema with the
LowerCaseStrField
field type, you can:- Use the ALTER SEARCH INDEX SCHEMA command in cqlsh
- Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool get_core_schema and dsetool write_resource.
For example, in cqlsh, the following command adds the
LowerCaseStrField
field type to the new field
medicalNotes
if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
Remember: No matter which command you choose, using cqlsh or
dsetool, be sure to
RELOAD
and
REBUILD
the search index in each datacenter in the
cluster.RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> <fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/> <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/> </types> <fields> ... <field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/> ... <field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/> </fields> <uniqueKey>(id,age)</uniqueKey> </schema>
Note: There
is a workaround to apply
The
search query is case insensitive. All queries are converted to lowercase and return the same
result. For example, searches for the following values return the same result: LowerCaseStrField
to primary key columns. To do
so, use the copyField
declaration to copy the primary key field data to
the new field that's defined as type LowerCaseStrField
. Example:
ALTER SEARCH INDEX SCHEMA ON <table> ADD lowerCaseString key_column_copy;
ALTER SEARCH INDEX SCHEMA ON <table> ADD copyField[@source='key_column', @dest='key_column_copy'];
RELOAD SEARCH INDEX ON <table>;
REBUILD SEARCH INDEX ON <table>;
name
Name
NAME