Search index config

Changing search index config

To create and make changes to the search index config, follow these basic steps:

  1. Create a search index. For example:

    CREATE SEARCH INDEX ON demo.health_data;
  2. Alter the search index. For example:

    ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 30000;
  3. Optionally view the XML of the pending search index. For example:

    DESCRIBE PENDING SEARCH INDEX CONFIG on demo.health_data;
  4. Make the pending changes active. For example:

    RELOAD SEARCH INDEX ON demo.health_data;

Sample search index config

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
  <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
  <luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
  <dseTypeMappingVersion>2</dseTypeMappingVersion>
  <directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
  <indexConfig>
    <rt>false</rt>
    <rtOffheapPostings>true</rtOffheapPostings>
    <useCompoundFile>false</useCompoundFile>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="maxCommitsToKeep">1</str>
      <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
    <infoStream file="INFOSTREAM.txt">false</infoStream>
  </indexConfig>
  <jmx/>
  <updateHandler class="solr.DirectUpdateHandler2">
    <autoSoftCommit>
      <maxTime>10000</maxTime>
    </autoSoftCommit>
  </updateHandler>
  <query>
    <maxBooleanClauses>1024</maxBooleanClauses>
    <filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048" lowWaterMarkMB="1024"/>
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
    <useColdSearcher>true</useColdSearcher>
    <maxWarmingSearchers>16</maxWarmingSearchers>
  </query>
  <requestDispatcher handleSelect="true">
    <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
    <httpCaching never304="true"/>
  </requestDispatcher>
  <requestHandler class="solr.SearchHandler" default="true" name="search">
    <lst name="defaults">
      <int name="rows">10</int>
    </lst>
  </requestHandler>
  <requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query">
    <lst name="defaults">
      <int name="rows">10</int>
    </lst>
  </requestHandler>
  <requestHandler class="solr.UpdateRequestHandler" name="/update"/>
  <requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
  <requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
  <requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field" startup="lazy"/>
  <requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document" startup="lazy"/>
  <requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
  <requestHandler class="solr.PingRequestHandler" name="/admin/ping">
    <lst name="invariants">
      <str name="qt">search</str>
      <str name="q">solrpingquery</str>
    </lst>
    <lst name="defaults">
      <str name="echoParams">all</str>
    </lst>
  </requestHandler>
  <requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="echoHandler">true</str>
    </lst>
  </requestHandler>
  <admin>
    <defaultQuery>*:*</defaultQuery>
  </admin>
</config>

For CQL index management, use configuration element shortcuts with CQL commands.

Configuration elements are listed alphabetically by shortcut.

The XML element is shown with the element start tag. An ellipsis indicates that other elements or attributes are not shown.

autoCommitTime

Defines the time interval between updates to the search index with the most recent data after an INSERT, UPDATE, or DELET E. By default, changes are automatically committed every 10000 milliseconds. To change the time interval between updates:

  1. Set auto commit time on the pending search index:

    ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 30000;
  2. You can view the pending search config:

    DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;

    The resulting XML shows the maximum time between updates is 30000 milliseconds:

    <updateHandler class="solr.DirectUpdateHandler2">
        <autoSoftCommit>
          <maxTime>30000</maxTime>
        </autoSoftCommit>
      </updateHandler>
  3. To make the pending changes active, reload the search index:

    RELOAD SEARCH INDEX ON wiki.solr;
defaultQueryField

Name of the default field to query. Default not set. To set the field to use when no field is specified by the query, see Setting up default query field.

directoryFactory

The directory factory to use for search indexes. Encryption is enabled per search index. To enable encryption for a search index, change the class for directoryFactory to EncryptedFSDirectoryFactory.

  1. Enable encryption on the pending search index:

    ALTER SEARCH INDEX CONFIG ON wiki.solr SET directoryFactory = EncryptedFSDirectoryFactory;
  2. You can view the pending search config:

    DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;

    The resulting XML shows that encryption is enabled:

    <directoryFactory class="solr.EncryptedFSDirectoryFactory" name="DirectoryFactory"/>
  3. To make the pending changes active, reload the search index:

    RELOAD SEARCH INDEX ON wiki.solr;

Even though additional properties are available to tune encryption, DataStax recommends using the default settings.

filterCacheLowWaterMark

Default is 1024 MB. See below.

filterCacheHighWaterMark

Default is 2048 MB.

The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a search index. This implementation contrasts with the default Solr implementation which defines bounds for filter cache usage per segment. SolrFilterCache bounding works by evicting cache entries after the configured per search index (per core) high watermark is reached, and stopping after the configured lower watermark is reached.

  • The filter cache is cleared when the search index is reloaded.

  • SolrFilterCache does not support auto-warming.

SolrFilterCache defaults to offheap. In general, the larger the index is, then the larger the filter cache should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set to 4 to 5 GB.

  1. To change cache eviction for a large index, set the low and high values one at a time:

    ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheHighWaterMark = 5000;
    ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheLowWaterMark = 2000;
  2. View the pending search index config:

    <query>
    ...
        <filterCache class="solr.SolrFilterCache" highWaterMarkMB="5000" lowWaterMarkMB="2000"/>
    ...
    </query>
  3. To make the pending changes active, reload the search index:

    RELOAD SEARCH INDEX ON wiki.solr;
mergeFactor

When a new segment causes the number of lowest-level segments to exceed the merge factor value, then those segments are merged together to form a single large segment. When the merge factor is 10, each merge results in the creation of a single segment that is about ten times larger than each of its ten constituents. When there are 10 of these larger segments, then they in turn are merged into an even larger single segment. Default is 10.

  1. To change the number of segments to merge at one time:

    ALTER SEARCH INDEX CONFIG ON solr.wiki SET mergeFactor = 5;
  2. View the pending search index config:

    <indexConfig>
    ...
        <mergeFactor>10</mergeFactor>
    ...
      </indexConfig>
  3. To make the pending changes active, reload the search index:

    RELOAD SEARCH INDEX ON wiki.solr;
mergeMaxThreadCount

Must configure with mergeMaxMergeCount. The number of concurrent merges that Lucene can perform for the search index. The default mergeScheduler settings are set automatically. Do not adjust this setting.

Default: ½ the number of tpc_cores

mergeMaxMergeCount

Must configure with mergeMaxThreadCount. The number of pending merges (active and in the backlog) that can accumulate before segment merging starts to block/throttle incoming writes. The default mergeScheduler settings are set automatically. Do not adjust this setting.

Default: 2x the mergeMaxThreadCount

ramBufferSize

The index RAM buffer size in megabytes (MB). The RAM buffer holds uncommitted documents. A larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/O pressure which is ideal for higher write workload scenarios.

For example, adjust the ramBufferSize when you configure live indexing:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 100;
ALTER SEARCH INDEX CONFIG ON wiki.solr SET realtime = true;
ALTER SEARCH INDEX CONFIG ON wiki.solr SET ramBufferSize = 2048;
RELOAD SEARCH INDEX ON wiki.solr ;

Default: 512

realtime

Enables live indexing to increase indexing throughput. Enable live indexing on only one node per cluster. Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly indexed data.

Live indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent NRT setup. See Tune RT indexing.

Configuration elements without shortcuts

To specify configuration elements that do not have shortcuts, you can specify the XML path to the setting and separate child elements using a period.

deleteApplicationStrategy

Controls how to retrieve deleted documents when deletes are being applied. Seek exact is the safe default most people should choose, but for a little extra performance you can try seekceiling.

Valid case-insensitive values are:

  • seekexact

    Uses bloom filters to avoid reading from most segments. Use when memory is limited and the unique key field data does not fit into memory.

  • seekceiling

    More performant when documents are deleted/inserted into the database with sequential keys, because this strategy can stop reading from segments when it is known that terms can no longer appear.

Default: seekexact

mergePolicyFactory

The AutoExpungeDeletesTieredMergePolicy custom merge policy is based on TieredMergePolicy. This policy cleans up the large segments by merging them when deletes reach the percentage threshold. A single auto expunge merge occurs at a time. Use for large indexes that are not merging the largest segments due to deletes. To determine whether this merge setting is appropriate for your workflow, view the segments on the Solr Segment Info screen.

When set, the XML is described as:

<indexConfig>
  <mergePolicyFactory class="org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory">
    <int name="maxMergedSegmentMB">5000</int>
    <int name="forceMergeDeletesPctAllowed">25</int>
    <bool name="mergeSingleSegments">true</bool>
  </mergePolicyFactory>
</indexConfig>

To extend TieredMergePolicy to support automatic removal of deletes:

  1. To enable automatic removal of deletes, set the custom policy:

    ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].bool[@name='mergeSingleSegments'] = true;
  2. Set the maximum segment size in MB:

    ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='maxMergedSegmentMB'] = 5000;
  3. Set the percentage threshold for deleting from the large segments:

    ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='forceMergeDeletesPctAllowed'] = 25;

    If mergeFactor is in the existing index config, you must drop it from the search index before you alter the table to support automatic removal of deletes:

    ALTER SEARCH INDEX CONFIG ON wiki.solr DROP indexConfig.mergePolicyFactory;
parallelDeleteTasks

Regulates how many tasks are created to apply deletes during soft/hard commit in parallel. Supported for RT and NRT indexing. Specify a positive number greater than 0.

Leave parallelDeleteTasks at the default value, except when issues occur with write load when running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact your read performance, then set this value lower.

Default: the number of available processors

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com