Search index configuration
DataStax recommends the CQL commands CREATE SEARCH INDEX and ALTER SEARCH INDEX CONFIG.
You can also use dsetool commands to manage search indexes.
Change the search index configuration
To create and alter a search index configuration, do the following:
-
Create a search index:
CREATE SEARCH INDEX ON demo.health_data; -
Alter the search index. The change is pending until you reload the search index.
ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 30000; -
Optional: View the XML of the pending, altered search index:
DESCRIBE PENDING SEARCH INDEX CONFIG on demo.health_data; -
Apply the pending changes:
RELOAD SEARCH INDEX ON demo.health_data;
Search index configuration syntax
The following snippet is an example of a search index configuration in XML:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
<dseTypeMappingVersion>2</dseTypeMappingVersion>
<directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
<indexConfig>
<rt>false</rt>
<rtOffheapPostings>true</rtOffheapPostings>
<useCompoundFile>false</useCompoundFile>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</indexConfig>
<jmx/>
<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>10000</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048" lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="true">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
<httpCaching never304="true"/>
</requestDispatcher>
<requestHandler class="solr.SearchHandler" default="true" name="search">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="solr.UpdateRequestHandler" name="/update"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
<requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field" startup="lazy"/>
<requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document" startup="lazy"/>
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
<requestHandler class="solr.PingRequestHandler" name="/admin/ping">
<lst name="invariants">
<str name="qt">search</str>
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
In the following examples, XML snippets are provided to illustrate specific configuration elements. These example XML elements are shown with their element start tags. An ellipsis indicates that other elements or attributes aren’t shown in the example.
Search index configuration elements with shortcuts
For CQL index management, use configuration element shortcuts with CQL commands.
In the following reference, the configuration elements are listed alphabetically by shortcut:
- autoCommitTime
-
Defines the time interval between updates to the search index with the most recent data after an INSERT, UPDATE, or DELETE. By default, changes are automatically committed every 10000 milliseconds. For more information, see Tune DSE Search for maximum indexing throughput.
To change the time interval between updates, use
ALTER SEARCH INDEX CONFIGto setautoCommitTime:ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 30000;-
View the pending search config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr; -
Verify that the returned XML shows the new maximum time between updates (30000 milliseconds, for this example):
... <updateHandler class="solr.DirectUpdateHandler2"> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit> </updateHandler> ... -
Reload the search index to apply the pending changes:
RELOAD SEARCH INDEX ON wiki.solr;
-
- defaultQueryField
-
Name of the default field to query when no field is specified in the query. For more information, see Set default query field.
Default: Not set.
- directoryFactory
-
The directory factory to use for search indexes. Encryption is enabled per search index. To enable encryption for a search index, change the class for
directoryFactorytoEncryptedFSDirectoryFactory.-
Use
ALTER SEARCH INDEX CONFIGto enable encryption on a search index:ALTER SEARCH INDEX CONFIG ON wiki.solr SET directoryFactory = EncryptedFSDirectoryFactory; -
View the pending search config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr; -
Verify that the returned XML shows that encryption is enabled:
<directoryFactory class="solr.EncryptedFSDirectoryFactory" name="DirectoryFactory"/> -
Reload the search index to apply the changes:
RELOAD SEARCH INDEX ON wiki.solr;Although additional properties are available to tune encryption, DataStax recommends using the default settings.
-
- filterCacheLowWaterMark
-
Default: 1024 MB
See
filterCacheHighWaterMark.
- filterCacheHighWaterMark
-
Default: 2048 MB
The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a search index. This implementation contrasts with the default Solr implementation which defines bounds for filter cache usage per segment.
SolrFilterCachebounding works by evicting cache entries after the configured per search index (per core) high watermark is reached, and stopping after the configured lower watermark is reached.The filter cache is cleared when the search index is reloaded.
SolrFilterCachedoesn’t support auto-warming.SolrFilterCachedefaults tooffheap. In general, the larger the index is, then the larger the filter cache should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set it to 4 to 5 GB.-
To change cache eviction for a large index, set the low and high values individually:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET filterCacheHighWaterMark = 5000;ALTER SEARCH INDEX CONFIG ON wiki.solr SET filterCacheLowWaterMark = 2000; -
View the pending search index config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr; -
Verify that the returned XML shows the new filter cache settings:
<query> ... <filterCache class="solr.SolrFilterCache" highWaterMarkMB="5000" lowWaterMarkMB="2000"/> ... </query> -
Reload the search index to apply the changes:
RELOAD SEARCH INDEX ON wiki.solr;
-
- mergeFactor
-
When a new segment causes the number of lowest-level segments to exceed the merge factor value, then those segments are merged together to form a single large segment. When the merge factor is 10, each merge results in the creation of a single segment that is about ten times larger than each of its ten constituents. When there are 10 of these larger segments, then they in turn are merged into an even larger single segment.
Default: 10
-
Use
ALTER SEARCH INDEX CONFIGto change the number of segments to merge at one time:ALTER SEARCH INDEX CONFIG ON wiki.solr SET mergeFactor = 5; -
View the pending search index config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr; -
Verify that the returned XML shows the new merge factor setting:
<indexConfig> ... <mergeFactor>5</mergeFactor> ... </indexConfig> -
Reload the search index to apply the changes:
RELOAD SEARCH INDEX ON wiki.solr;
-
- mergeMaxThreadCount
-
Don’t adjust this setting. The default
mergeSchedulersettings are set automatically.This parameter sets the number of concurrent merges that Lucene can perform for the search index. It is paired with
mergeMaxMergeCount.Default: Half the number of
tpc_cores
- mergeMaxMergeCount
-
Don’t adjust this setting. The default
mergeSchedulersettings are set automatically.This parameter sets the number of pending merges (active and in the backlog) that can accumulate before segment merging starts to block/throttle incoming writes.
It is paired with
mergeMaxThreadCount.Default: Twice the value of
mergeMaxThreadCount
- ramBufferSize
-
The index RAM buffer size in megabytes (MB).
Default:
512The RAM buffer holds uncommitted documents. A larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/O pressure, which is ideal for higher write workload scenarios.
For example, you can adjust the
ramBufferSizewhen you configure live indexing:ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 100; ALTER SEARCH INDEX CONFIG ON wiki.solr SET realtime = true; ALTER SEARCH INDEX CONFIG ON wiki.solr SET ramBufferSize = 2048; RELOAD SEARCH INDEX ON wiki.solr;
- realtime
-
Enables live indexing to increase indexing throughput.
Enable live indexing on only one node per cluster.
Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM buffer with more frequent, cheaper soft-commits that provide earlier visibility to newly indexed data.
Live indexing requires a larger RAM buffer (
ramBufferSize) and more memory usage than an otherwise equivalent NRT setup. For more information, see Tune RT indexing.
Search index configuration elements without shortcuts
To specify configuration elements that don’t have shortcuts, you can specify the XML path to the setting and separate child elements using a period.
- deleteApplicationStrategy
-
Controls how to retrieve deleted documents when deletes are being applied.
seekexactis the safe default for most use cases, but for a little extra performance you can tryseekceiling.Valid case-insensitive values are as follows:
-
seekexact(default, recommended): Uses bloom filters to avoid reading from most segments. Use when memory is limited and the unique key field data does not fit into memory. -
seekceiling: More performant when documents are deleted/inserted into the database with sequential keys because this strategy can stop reading from segments when it is known that terms can no longer appear.
-
- mergePolicyFactory
-
The
AutoExpungeDeletesTieredMergePolicycustom merge policy is based onTieredMergePolicy. This policy cleans up the large segments by merging them when deletes reach the percentage threshold. A single auto expunge merge occurs at a time.Use for large indexes that are not merging the largest segments due to deletes. To determine whether this merge setting is appropriate for your workflow, view the segments in Solr Segment Info.
For example, assume you have the following search index configuration:
<indexConfig> <mergePolicyFactory class="org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory"> <int name="maxMergedSegmentMB">5000</int> <int name="forceMergeDeletesPctAllowed">25</int> <bool name="mergeSingleSegments">true</bool> </mergePolicyFactory> </indexConfig>-
To extend
TieredMergePolicyto enable automatic removal of deletes, set the custom policy withALTER SEARCH INDEX CONFIG:ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].bool[@name='mergeSingleSegments'] = true; -
Set the maximum segment size in MB:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='maxMergedSegmentMB'] = 5000; -
Set the percentage threshold for deleting from the large segments:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='forceMergeDeletesPctAllowed'] = 25; -
If
mergeFactoris in the existing index config, you must drop it from the search index before you alter the table to support automatic removal of deletes:ALTER SEARCH INDEX CONFIG ON wiki.solr DROP indexConfig.mergePolicyFactory;
-
- parallelDeleteTasks
-
Regulates how many tasks are created to apply deletes during soft/hard commit in parallel. Supported for RT and NRT indexing.
Default: The number of available processors
Use the default value for
parallelDeleteTasks, except when issues occur with write load when running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact your read performance, then you can try setting this value lower than the default. You must specify a positive number greater than 0.