Capacity planning for DSE Search

Using DSE Search is memory-intensive. Use a discovery process to develop a plan to ensure sufficient memory resources.

Using DSE Search is memory-intensive. Solr rereads the entire row when updating indexes, and can impose a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.

This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements. For hardware capacity, see Search node capacity.

Overview 

First, estimate how large your Solr index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the field cache memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.

Capacity planning requires a significant effort by operations personnel to achieve the best results:
  • Optimal heap size per node.
  • Estimate of the number of nodes that are required for your application.
  • Increase the replication factor to support more queries per second.
  • When vnodes are not in use, distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
Note: The Pre-flight tool can detect and fix many invalid or suboptimal configuration settings.

Prerequisites

A node with:
  • The amount of RAM that is determined during capacity planning
  • SSD or a spinning disk with it's own dedicated disk. A dedicated SSD is recommended, but is not required.
Input data:
  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The maximum number of documents the system will support

Procedure

  1. Create the schema.xml and solrconfig.xml files.
  2. Start a node.
  3. Add N docs.
  4. Run a range of queries that simulate a production environment.
  5. View the status of the field cache memory to discover the memory usage.
  6. View the size of the index (on disk) included in the status information about the Solr core.
  7. Based on the server's system IO cache available, set a maximum index size per server.
  8. Based on the memory usage, set a maximum heap size required per server.
    • For JVM memory to provide the required performance and memory capacity, DataStax recommends a heap size of 14 GB or larger.
    • For live indexing, DataStax recommends a heap size of at least 20 GB for use with Java 1.8 and G1GC. A larger heap size allows you to allocate more RAM buffer size, which contributes to faster live (RT) indexing. Enable live indexing on only one Solr core per cluster.
  9. Calculate the maximum number of documents per node based on steps 6 and 7.

    When the system is approaching the maximum docs per node, add more nodes.