DSE Search configuration file (solrconfig.xml)

solrconfig.xml is the primary DSE search configuration file.

The solrconfig.xml resource file is the primary configuration file for configuring Solr for use with DSE Search.

You can use custom resources or automatically create resources, including the solrconfig.xml file. The solrconfig.xml resource is persisted in the solr_admin.solr_resources database table.

Reload a Solr core after you modify the solrconfig.xml file. Changes apply only to the node where you reload the core.

Parameters

To tune DSE Search, you can modify the following parameters. For full details, see the Apache Solr Reference Guide.

autoSoftCommit 
For live indexing, ensure that the autoSoftCommit time is 100ms.
<autoSoftCommit>
              <maxTime>100</maxTime>
              </autoSoftCommit>
See Configuring and tuning indexing performance.
directoryFactory 
The directory factory to use for search indexes. Encryption is enabled per core. To enable encryption for each core, change the class for directoryFactory to EncryptedFSDirectoryFactory.
Additional properties are available to tune encryption, although DataStax recommends using the default settings:
dseAllowTokenizedUniqueKey 
By default, a tokenized unique key is not permitted. To disable tokenized key validation, add the dseAllowTokenizedUniqueKey entry and set to true:
<dseAllowTokenizedUniqueKey>true</dseAllowTokenizedUniqueKey>
dseTypeMappingVersion  
The Solr type mapping version defines how Solr types are mapped to Cassandra Thrift or Cassandra types. Changing a Solr type mapper is rarely required and is not recommended; however, for particular circumstances, such as converting Solr types such as the Solr LongField to TrieLongField, you configure the dseTypeMappingVersion using the force option. See Changing Solr types and Configuring the Solr type mapping version. Use this option only if you are an expert and have confirmed that the Cassandra internal validation classes of the types involved in the conversion are compatible. To change type, use force="true":
<dseTypeMappingVersion force = "true">1</dseTypeMappingVersion>
After changing the type mapping, you must reload the Solr core with reindexing.
dseUpdateRequestProcessorChain 
Configure a custom URP to extend the Solr UpdateRequestProcessor. See Field input/output (FIT) transformer API.
enableLazyFieldLoading 
Do not change the default value of true:
<enableLazyFieldLoading>true</enableLazyFieldLoading>
A Solr bug SOLR-8858 in earlier versions of Solr restricted changing this field.
fieldInputTransformer 
The field input transformer API is an option to the input/output transformer support in Solr.
fieldOutputTransformer 
The field output transformer API is an option to the input/output transformer support in Solr. See Field input/output (FIT) transformer API and the dev blog post an Introduction to DSE Field Transformers.
filtercache 
The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a Solr core. This implementation contrasts with the default Solr implementation which defines bounds for filter cache usage per segment. SolrFilterCache bounding works by evicting cache entries after the configured per core high watermark is reached, and stopping after the configured lower watermark is reached.

SolrFilterCache defaults to offheap. In general, the larger the index is, then the larger the filter cache should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set to 4 to 5 GB. See Collecting cache statistics.

Set the class attribute of the filterCache element to solr.SolrFilterCache and define the low and high watermark for cache eviction:
<filterCache class="solr.SolrFilterCache" lowWaterMarkMB="1024" highWaterMarkMB="2048" />
Note: SolrFilterCache does not support auto-warming.
indexConfig 
Parameters for tuning index building and configuring re-indexing:
  • deleteApplicationStrategy - Controls how deleted documents are retrieved while deletes are being applied.
    • seekexact - The safest default setting. Uses bloom filters to avoid reading from most segments and works better when memory is limited and the unique key field data doesn't fit into memory.
    • seekceiling - More performant. Can be faster, especially when documents are deleted/inserted into the database with sequential keys. This strategy stops reading from segments where it knows terms can no longer appear.
  • mergedSegmentWarmer - To use warmup segments in DSE Search:
    <mergedSegmentWarmer class="com.datastax.bdp.search.solr.core.TokenSegmentWarmer"/>
  • parallelDeleteTasks - Regulates how many tasks are created to apply deletes during soft/hard commit in parallel. Supported for RT and NRT indexing. Specify a positive number greater than 0. The default value is the number of available processors.

    Leave parallelDeleteTasks at the default value, except when issues occur with write load when running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact your read performance, then set this value lower. To prevent writes from overwhelming reads, reduce this value and max_solr_concurrency_per_core in dse.yaml.

  • ramBufferSizeMB - Change the size of the RAM buffer and increase the soft commit time for Configuring and tuning indexing performance.
lib 
The location for library files in DataStax Enterprise is not the same location as open source Solr. Contrary to the examples shown in the solrconfig.xml file that indicate support for relative paths, DataStax Enterprise does not support the relative path values that are set for the <lib> property. DSE Search fails to find files in directories that are defined by the <lib> property. The workaround is to place custom code or Solr contrib modules in the Solr library directories. See Configuring the Solr library path.
maxBooleanClauses 
Defines the maximum number of clauses in a boolean query.
After you change the parameter on all search cores, restart the nodes to make the change effective. Reloading the search cores does not make this change effective. If this value is exceeded, an exception is thrown.
Note: Limitations and Solr known issues apply to DSE Search queries, including:
  • The 1024 maxBoolean clause limit in SOLR-4586.
  • The Solr BooleanQuery MaxClauseCount is a static variable with a single value across the entire JVM. You must change the value on all search cores.
mergeScheduler 
In releases earlier than DSE 5.0.8, the default mergeScheduler settings are not appropriate for DSE Search near real time (NRT) indexing production use on a typical size server. See Configuring and tuning indexing performance.
In DSE 5.0.8 and later,
queryExecutorThreads 
You can set the query executor threads parameter in the solrconfig.xml file to enable multi-threading for filter queries, normal queries, and doc values facets:
<queryExecutorThreads>4</queryExecutorThreads>
See Configuring multi-threaded queries.
queryResponseWriter 
For performance, you can configure DSE Search to parallelize the retrieval of a large number of rows.
<queryResponseWriter name="javabin" class="solr.BinaryResponseWriter">
              <str name="resolverFactory">com.datastax.bdp.search.solr.response.ParallelRowResolver$Factory</str>
              </queryResponseWriter>
See Parallelizing large Cassandra row reads.
ramBufferSizeMB 
The default value is 512 MB.
requestHandler 
The correct search handler is required for CQL Solr queries in DSE Search.

When you automatically generate resources, the solrconfig.xml file already contains the request handler for running CQL Solr queries in DSE Search. To run CQL Solr queries using custom resources, the CqlSearchHandler handler is automatically injected:

<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query" />

For recommendations for the basic configuration for the search handler, and an example that shows adding a search component, see Configuring additional search components.

In this example, to configure the Data Import Handler, you can add a request handler element that contains the location of data-config.xml and data source connection information.

For use with the HTTP API only, you can define the default number of rows in the solrconfig.xml file:
<requestHandler name="search" class="solr.SearchHandler" default="true">
       <lst name="defaults">
       <int name="rows">10</int>
     </lst>
  </requestHandler>
rt 

To enable live indexing (also known as RT), add <rt>true</rt> to the <indexConfig> attribute and configure the options.

See Configuring and tuning indexing performance.
updateHandler 
You can configure per-document or per-field TTL. See Expiring a DSE Search column.