Increasing indexing throughput

Live indexing enables queries to be made against recently indexed data. Live indexing, also known as RT (real time) indexing, improves index throughput and reduces Lucene reader latency while supporting all Solr functionality.

Live indexing enables queries to be made against recently indexed data. Live indexing, also known as RT (real time) indexing, improves index throughput and reduces Lucene reader latency while supporting all Solr functionality. Live indexing works for all DSE Search applications. Fields that are sorted on must be docvalues, otherwise the field cache is used and is inefficient with live indexing.
The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml

Procedure

  1. Enable live indexing on only one Solr core per cluster.
  2. To enable live indexing (also known as RT), add <rt>true</rt> to the <indexConfig> attribute of the solrconfig.xml file.
    <rt>true</rt>
    
  3. To configure live indexing, edit the solrconfig.xml file and increase the RAM buffer size and ensure that the autoSoftCommit time is 100ms:
    <ramBufferSizeMB>2000</ramBufferSizeMB>
    ...
    <autoSoftCommit>
        <maxTime>100</maxTime>
    </autoSoftCommit>
    The larger RAM buffer enables faster indexing.
  4. Increase the heap size. For live indexing, DataStax recommends a heap size of at least 20 GB for use with Java 1.8 and G1GC. A larger heap size allows you to allocate more RAM buffer size, which contributes to faster live (RT) indexing. Enable live indexing on only one Solr core per cluster.
  5. Set the value of the max_solr_concurrency_per_core in the dse.yaml file. In the dse.yaml file, define the number of buffered asynchronous index updates per Solr core before the back-pressure is activated with the back_pressure_threshold_per_core option. The default value is 1000 times the number of available CPU cores.
  6. Restart DataStax Enterprise to use live indexing with the increased heap size.
  7. Optional: To filter a given range query:
    _query_:"{!rtrange}tint:[0 TO 5}" OR _query_:"{!rtrange}tint:[-10 TO -5}"