Using LowerCaseStrField with search indexes

Converts data into lowercase and correctly stores it in docValues.

DataStax Enterprise 6.0.8 introduces a custom field type, LowerCaseStrField, which provides the following features:
  • Converts the data into lowercase and correctly stores the lowercase data in docValues.
  • Converts the query values to lowercase.
Note: You cannot apply LowerCaseStrField to a table's primary key. You also cannot use any analyzers with LowerCaseStrField.

DataStax advises against using TextField with solr.KeywordTokenizer and solr.LowerCaseFilterFactory. Unintended search results could occur because the raw data was not stored as lowercase in docValues, contrary to expectations. Instead, use the custom LowerCaseStrField type as described in this topic.

For example, to use LowerCaseStrField on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"
The command creates a search index with birthplace using the LowerCaseStrField field type. The field type is added automatically.

To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.

Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
  <types>
    <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
    <fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
    <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
  </types>
  <fields>
    <field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
    ...
    <field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
    <field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/>
    ...
  </fields>
  <uniqueKey>(id,age)</uniqueKey>
</schema>
To add a new field to an existing index schema with the LowerCaseStrField field type, you can:
  • Use the ALTER SEARCH INDEX SCHEMA command in cqlsh
  • Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool get_core_schema and dsetool write_resource.
For example, in cqlsh, the following command adds the LowerCaseStrField field type to the new field medicalNotes if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
Remember: No matter which command you choose, using cqlsh or dsetool, be sure to RELOAD and REBUILD the search index in each datacenter in the cluster.
RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
  <types>
    <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
    <fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
    <fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
  </types>
  <fields>
...
    <field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
...
    <field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/>
  </fields>
  <uniqueKey>(id,age)</uniqueKey>
</schema>
Note: There is a workaround to apply LowerCaseStrField to primary key columns. To do so, use the copyField declaration to copy the primary key field data to the new field that's defined as type LowerCaseStrField. Example:
ALTER SEARCH INDEX SCHEMA ON <table> ADD lowerCaseString key_column_copy;
ALTER SEARCH INDEX SCHEMA ON <table> ADD copyField[@source='key_column', @dest='key_column_copy'];
RELOAD SEARCH INDEX ON <table>;
REBUILD SEARCH INDEX ON <table>;
The search query is case insensitive. All queries are converted to lowercase and return the same result. For example, searches for the following values return the same result:
  • name
  • Name
  • NAME