Search index config
Reference information to change query behavior for search indexes:
-
DataStax recommends CQL
CREATE SEARCH INDEX
andALTER SEARCH INDEX CONFIG
commands. -
dsetool
commands can also be used to manage search indexes.
Changing search index config
To create and make changes to the search index config, follow these basic steps:
-
Create a search index. For example:
CREATE SEARCH INDEX ON demo.health_data;
-
Alter the search index. For example:
ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 30000;
-
Optionally view the XML of the pending search index. For example:
DESCRIBE PENDING SEARCH INDEX CONFIG on demo.health_data;
-
Make the pending changes active. For example:
RELOAD SEARCH INDEX ON demo.health_data;
Sample search index config
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <config> <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError> <luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion> <dseTypeMappingVersion>2</dseTypeMappingVersion> <directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/> <indexConfig> <rt>false</rt> <rtOffheapPostings>true</rtOffheapPostings> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>512</ramBufferSizeMB> <mergeFactor>10</mergeFactor> <reopenReaders>true</reopenReaders> <deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> </deletionPolicy> <infoStream file="INFOSTREAM.txt">false</infoStream> </indexConfig> <jmx/> <updateHandler class="solr.DirectUpdateHandler2"> <autoSoftCommit> <maxTime>10000</maxTime> </autoSoftCommit> </updateHandler> <query> <maxBooleanClauses>1024</maxBooleanClauses> <filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048" lowWaterMarkMB="1024"/> <enableLazyFieldLoading>true</enableLazyFieldLoading> <useColdSearcher>true</useColdSearcher> <maxWarmingSearchers>16</maxWarmingSearchers> </query> <requestDispatcher handleSelect="true"> <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/> <httpCaching never304="true"/> </requestDispatcher> <requestHandler class="solr.SearchHandler" default="true" name="search"> <lst name="defaults"> <int name="rows">10</int> </lst> </requestHandler> <requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query"> <lst name="defaults"> <int name="rows">10</int> </lst> </requestHandler> <requestHandler class="solr.UpdateRequestHandler" name="/update"/> <requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/> <requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/> <requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field" startup="lazy"/> <requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document" startup="lazy"/> <requestHandler class="solr.admin.AdminHandlers" name="/admin/"/> <requestHandler class="solr.PingRequestHandler" name="/admin/ping"> <lst name="invariants"> <str name="qt">search</str> <str name="q">solrpingquery</str> </lst> <lst name="defaults"> <str name="echoParams">all</str> </lst> </requestHandler> <requestHandler class="solr.DumpRequestHandler" name="/debug/dump"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="echoHandler">true</str> </lst> </requestHandler> <admin> <defaultQuery>*:*</defaultQuery> </admin> </config>
For CQL index management, use configuration element shortcuts with CQL commands.
Configuration elements are listed alphabetically by shortcut. The XML element is shown with the element start tag. An ellipsis indicates that other elements or attributes are not shown.
autoCommitTime
-
Defines the time interval between updates to the search index with the most recent data after an
INSERT
,UPDATE
, orDELETE
. By default, changes are automatically committed every 10000 milliseconds. To change the time interval between updates:-
Set auto commit time on the pending search index:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 30000;
-
You can view the pending search config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;
The resulting XML shows the maximum time between updates is 30000 milliseconds:
<updateHandler class="solr.DirectUpdateHandler2"> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit> </updateHandler>
-
To make the pending changes active, reload the search index:
RELOAD SEARCH INDEX ON wiki.solr;
-
defaultQueryField
-
Name of the default field to query. Default not set. To set the field to use when no field is specified by the query, see Setting up default query field.
directoryFactory
-
The directory factory to use for search indexes. Encryption is enabled per search index. To enable encryption for a search index, change the class for
directoryFactory
toEncryptedFSDirectoryFactory
.-
Enable encryption on the pending search index:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET directoryFactory = EncryptedFSDirectoryFactory;
-
You can view the pending search config:
DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;
The resulting XML shows that encryption is enabled:
<directoryFactory class="solr.EncryptedFSDirectoryFactory" name="DirectoryFactory"/>
-
To make the pending changes active, reload the search index:
RELOAD SEARCH INDEX ON wiki.solr;
Even though additional properties are available to tune encryption, DataStax recommends using the default settings.
-
filterCacheLowWaterMark
-
Default is 1024 MB. See below.
filterCacheHighWaterMark
-
Default is 2048 MB.
The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a search index. This implementation contrasts with the default Solr implementation which defines bounds for filter cache usage per segment.
SolrFilterCache
bounding works by evicting cache entries after the configured per search index (per core) high watermark is reached, and stopping after the configured lower watermark is reached.-
The filter cache is cleared when the search index is reloaded.
-
SolrFilterCache
does not support auto-warming.SolrFilterCache
defaults to offheap. In general, the larger the index is, then the larger the filter cache should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set to 4 to 5 GB.
-
To change cache eviction for a large index, set the low and high values one at a time:
ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheHighWaterMark = 5000;
ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheLowWaterMark = 2000;
-
View the pending search index config:
<query> ... <filterCache class="solr.SolrFilterCache" highWaterMarkMB="5000" lowWaterMarkMB="2000"/> ... </query>
-
To make the pending changes active, reload the search index:
RELOAD SEARCH INDEX ON wiki.solr;
-
mergeFactor
-
When a new segment causes the number of lowest-level segments to exceed the merge factor value, then those segments are merged together to form a single large segment. When the merge factor is 10, each merge results in the creation of a single segment that is about ten times larger than each of its ten constituents. When there are 10 of these larger segments, then they in turn are merged into an even larger single segment. Default is 10.
-
To change the number of segments to merge at one time:
ALTER SEARCH INDEX CONFIG ON solr.wiki SET mergeFactor = 5;
-
View the pending search index config:
<indexConfig> ... <mergeFactor>10</mergeFactor> ... </indexConfig>
-
To make the pending changes active, reload the search index:
RELOAD SEARCH INDEX ON wiki.solr;
-
mergeMaxThreadCount
-
Must configure with
mergeMaxMergeCount
. The number of concurrent merges that Lucene can perform for the Solr core. The defaultmergeScheduler
settings are set automatically. Do not adjust this setting. mergeMaxMergeCount
-
Must configure with
mergeMaxThreadCount
. The number of pending merges (active and in the backlog) that can accumulate before segment merging starts to block/throttle incoming writes. The defaultmergeScheduler
settings are set automatically. Do not adjust this setting. ramBufferSize
-
The index RAM buffer size in megabytes (MB). The RAM buffer holds uncommitted documents. A larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/O pressure which is ideal for higher write workload scenarios. Default is 512.
For example, adjust the
ramBufferSize
when you configure live indexing:ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 100; ALTER SEARCH INDEX CONFIG ON wiki.solr SET realtime = true; ALTER SEARCH INDEX CONFIG ON wiki.solr SET ramBufferSize = 2048; RELOAD SEARCH INDEX ON wiki.solr ;
realtime
-
Enables live indexing to increase indexing throughput. Enable live indexing on only one node per cluster. Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly indexed data.
Live indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent NRT setup. See Tuning RT indexing.
Configuration elements without shortcuts
To specify configuration elements that do not have shortcuts, you can specify the XML path to the setting and separate child elements using a period.
deleteApplicationStrategy
-
Controls how to retrieve deleted documents when deletes are being applied. Seek exact is the safe default most people should choose, but if you are looking for a little extra performance you can try seek ceiling.
Valid case-insensitive values are:
-
seekexact
Uses bloom filters to avoid reading from most segments. Use when memory is limited and the unique key field data does not fit into memory.
-
seekceiling
More performant when documents are deleted/inserted into the database with sequential keys, because this strategy can stop reading from segments when it is known that terms can no longer appear.
-
mergePolicyFactory
-
The
AutoExpungeDeletesTieredMergePolicy
custom merge policy is based onTieredMergePolicy
. This policy cleans up the large segments by merging them when deletes reach the percentage threshold. A single auto expunge merge occurs at a time. Use for large indexes that are not merging the largest segments due to deletes. To determine whether this merge setting is appropriate for your workflow, view the segments on the Solr Segment Info screen.When set, the XML is described as:
<indexConfig> <mergePolicyFactory class="org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory"> <int name="maxMergedSegmentMB">5000</int> <int name="forceMergeDeletesPctAllowed">25</int> <bool name="mergeSingleSegments">true</bool> </mergePolicyFactory> </indexConfig>
To extend
TieredMergePolicy
to support automatic removal of deletes:-
To enable automatic removal of deletes, set the custom policy:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].bool[@name='mergeSingleSegments'] = true;
-
Set the maximum segment size in MB:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='maxMergedSegmentMB'] = 5000;
-
Set the percentage threshold for deleting from the large segments:
ALTER SEARCH INDEX CONFIG ON wiki.solr SET indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory'].int[@name='forceMergeDeletesPctAllowed'] = 25;
If
mergeFactor
is in the existing index config, you must drop it from the search index before you alter the table to support automatic removal of deletes:ALTER SEARCH INDEX CONFIG ON wiki.solr DROP indexConfig.mergePolicyFactory;
-
parallelDeleteTasks
-
Regulates how many tasks are created to apply deletes during soft/hard commit in parallel. Supported for RT and NRT indexing. Specify a positive number greater than 0. The default value is the number of available processors.
Leave
parallelDeleteTasks
at the default value, except when issues occur with write load when running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact your read performance, then set this value lower. To prevent writes from overwhelming reads, reduce this value and adjustmax_solr_concurrency_per_core
indse.yaml
.