Tuning search for maximum indexing throughput
Where is the cassandra.yaml
file?
The location of the cassandra.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
Where is the dse.yaml
file?
The location of the dse.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
To tune DataStax Enterprise (DSE) Search for maximum indexing throughput, follow the recommendations in this topic. Also see the related topics in Performance tuning and monitoring DSE Search. If search throughput improves in your development environment, consider using the recommendations in production.
DSE Search provides multi-threaded asynchronous indexing with a back pressure mechanism to avoid saturating available memory and to maintain stable performance.
Multi-threaded indexing improves performance on machines that have multiple CPU cores.
The IndexPool MBean
provides operational visibility and tuning through JMX.
Locate transactional and search data on separate SSDs
It is critical that you locate DSE Cassandra transactional data and Solr-based DSE Search data on separate Solid State Drives (SSDs). Failure to do so often results in sub-optimal search indexing performance. |
For the steps to accomplish this task, refer to Set the location of search indexes.
In addition, plan for sufficient memory resources and disk space to meet operational requirements. Refer to Capacity planning for DSE Search.
Determine physical CPU resources
Before you tune anything, determine how many physical CPUs you have. The JVM does not know whether CPUs are using hyper-threading.
Assess the IO throughput
DSE Search can be very IO intensive.
Performance is impacted by the Thread Per Core (TPC) asynchronous read and write paths architecture.
In your development environment, check the iowait
system metric by using the iostat command during peak load.
For example, on Linux:
iostat -x -c -d -t 1 600
IOwait
is a measure of time over a given period that a CPU (or all CPUs) spent idle because all runnable tasks were waiting for an IO operation to complete.
While each environment is unique, a general guidelines is to check whether iowait
is above 5% more than 5% of the time.
If that scenario occurs, try upgrading to faster SSD devices or tune the machine to use less IO, and test again.
Again, it’s important to locate the search data on dedicated SSDs, separate from the transactional data.
Tuning in DSE 5.1.0
The default mergeScheduler
settings are set automatically.
Do not adjust this setting.
Lucene merge scheduling and lack of parallelism might cause periods of 0 throughput.
dse.yaml
-
First, set the size of the indexing thread-pool per core to the number of physical CPUs available:
max_solr_concurrency_per_core: min(2, num physical non-HT CPU cores / num actively indexing Solr cores)
If
max_solr_concurrency_per_core
is set to 1, DSE Search uses the legacy synchronous indexing implementation. -
Next, set a maximum queue depth before back-pressure is activated. A good rule of thumb is 1000 updates per worker. The
back_pressure_threshold_per_core
option defines the number of buffered asynchronous index updates per Solr core before the back-pressure is activated with the option. DataStax recommends using aback_pressure_threshold_per_core
value of 1000 *max_solr_concurrency_per_core
. The default is 2000.back_pressure_threshold_per_core: 1000 * max_solr_concurrency_per_core
solrconfig.xml
Apply your maximum concurrency per core to the merge scheduler, where:
-
maxThreadCount
- maximum number of Lucene merge threads. -
maxMergeCount
- number of outstanding merges before Lucene throttles incoming indexing workers. IfmaxMergeCount
is too low, periods of zero indexing throughput occur.
<indexConfig> ... <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxThreadCount">your max_solr_concurrency_per_core</int> <int name="maxMergeCount">max(max(<maxThreadCount \* 2\>, <num_tokens \* 8\>), <maxThreadCount + 5\>)</int> </mergeScheduler> ... </indexConfig>
where num_tokens
is the number of token ranges to assign to the virtual node (vnode) as configured in cassandra.yaml
.
If the Solr data directory is mounted to a spinning disk, use:
<indexConfig> ... <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxThreadCount">1</int> <int name="maxMergeCount">6</int> </mergeScheduler> ... </indexConfig>
Tuning in DSE 5.1.1 and later
The dse.yaml
and search index config settings are automatically tuned optimally for nodes with a single active Solr core.
This automatic tuning is especially useful during auto-generated core creation.
Even with auto tuning, consider:
-
When multiple Solr cores are in play, for example, with DSE Graph, it still makes sense to scale down the resources assigned to them. However, rather than re-configuring every parameter, modify only
max_solr_concurrency_per_core
. All other tuning options, if not explicitly specified, are modified appropriately on restart. -
The number of threads per core used for auto-tuning max out at 2, even when proc/cpuinfo reports a higher thread count in the logs. For example:
/proc/cpuinfo is reporting 8 threads per CPU core, but DSE will assume a value of 2 for auto-tuning...
-
When the Lucene index files for your Solr core reside on a spinning/rotational hard disk, the solrconfig default for
maxThreadCount
is 1 and maxMergeCount is 6. -
Auto-tuning occurs only when options are not already specified. While the defaults are auto-tuned, they are still only defaults and do not override any manually specified options in
dse.yaml
(globally) andsolrconfig.xml
(per core).
Differences between indexing modes
There are two indexing modes in DSE Search:
-
Near-real-time (NRT) indexing is the default indexing mode for Apache Solr™ and Apache Lucene™.
-
Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly indexed data. However, RT indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent NRT setup.
Tuning NRT reindexing
For NRT only, to maximize NRT throughput during a manual re-index adjust these settings in the solrconfig.xml
file:
-
Increase the size of the RAM buffer, which is set to 512 MB by default. For example, increase to 1000:
<indexConfig> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>1000</ramBufferSizeMB> <mergeFactor>10</mergeFactor> . . .
The RAM buffer holds uncommitted documents for NRT. A larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/O pressure which is ideal for higher write workload scenarios.
-
Increase the soft commit time, which is set to 10 seconds (10000 ms) by default, to a larger value. For example, increase the time to 60 seconds:
<autoSoftCommit> <maxTime>60000</maxTime> </autoSoftCommit>
A disadvantage of changing the autoSoftCommit
attribute is that newly updated rows take longer than usual (1000ms) to appear in search results.
Tuning RT indexing
Live indexing uses more memory but reduces the time for docs to be searchable. During initial testing, enable live indexing on only one search index per cluster and monitor memory usage. Performance impact varies.
-
To enable live indexing (also known as RT), add
<rt>true</rt>
to the<indexConfig>
attribute of thesolrconfig.xml
file.<rt>true</rt>
-
To configure live indexing, edit the
solrconfig.xml
file and change these settings:-
Set the
autoSoftCommit/maxTime
to 1000 ms (1 second).The
autoSoftCommit.maxTime
for live indexing is not a soft commit, this value controls the refresh interval. For live indexing (RT), this refresh interval saturates at 1000 ms. Any higher value is overridden to limit the maximum time to 1000 ms. -
When live indexing is on the RAM buffer uses more memory than near real-time indexing. Set the RAM buffer size, 2048 MB is a good starting point for RT:
<ramBufferSizeMB>2048</ramBufferSizeMB>
-
For faster live indexing, configure the postings portion of the RAM buffer to be allocated offheap
<rtOffheapPostings>true</rtOffheapPostings>
Postings allocated offheap improve garbage collection (GC) times and prevent out-of-memory errors due to fluctuating live indexing memory usage.
<indexConfig> ... <rt>true</rt> <ramBufferSizeMB>2048</ramBufferSizeMB> <rtOffheapPostings>true</rtOffheapPostings> ... </indexConfig> ... <updateHandler class="solr.DirectUpdateHandler2"> ... <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit> ... </updateHandler>
-
-
Increase the heap size to at least 14 GB.
-
Restart DataStax Enterprise to use live indexing with the increased heap size.