Capacity planning for DSE Search 

Using a discovery process to develop a DSE Search capacity plan to ensure sufficient memory resources.

Using DSE Search is memory-intensive and rereads the entire row when updating indexes which can cause a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.

This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements.

Overview 

First, estimate how large your search index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.

Capacity planning requires a significant effort by operations personnel to:
  • Set the optimal heap size per node.
  • Estimate of the number of nodes that are required for your application.
  • Increase the replication factor to support more queries per second.
  • Distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
Note: The Preflight check tool can detect and fix many invalid or suboptimal configuration settings.

Prerequisites

A node with:
  • The amount of RAM that is determined during capacity planning.
  • SSD or a spinning disk with it's own dedicated disk. A dedicated SSD is recommended, but is not required.
Input data:
  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The maximum number of documents the system will support

Procedure

  1. Create the schema.xml and solrconfig.xml files.
  2. Start a node.
  3. Add N docs.
  4. Run a range of queries that simulate a production environment.
  5. View the size of the index (on disk) included in the status information about the Solr core.
  6. Based on the server's system IO cache available, set a maximum index size per server.
  7. Based on the memory usage, set a maximum heap size required per server.
    • For JVM memory to provide the required performance and memory capacity, DataStax recommends a heap size of 14 gigabytes (GB) or larger.
    • For faster live indexing you should configure live indexing (RT) postings to be allocated offheap.
      Note: Enable live indexing on only one search core per cluster.
  8. Calculate the maximum number of documents per node based on steps 6 and 7.

    When the system is approaching the maximum docs per node, add more nodes.