Capacity planning for DSE Search 

Using a discovery process to develop a DSE Search capacity plan to ensure sufficient memory resources.

Using DSE Search is memory-intensive and rereads the entire row when updating indexes which can cause a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.

This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements.

Overview 

First, estimate how large your search index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.

Capacity planning requires a significant effort by operations personnel to:
  • Set the optimal heap size per node.
  • Estimate of the number of nodes that are required for your application.
  • Increase the replication factor to support more queries per second.
  • Distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
Note: The Preflight check tool can detect and fix many invalid or suboptimal configuration settings.

Prerequisites

A node with:
  • The amount of RAM that is determined during capacity planning.
  • DataStax recommends the following:
    • One data/logs disk. (If using spinning disks, a separate disk for the commit log. See Disk space.)
    • One disk for DSE Search.
Input data:
  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The maximum number of documents the system will support

Procedure

  1. Create the schema.xml and solrconfig.xml files.
  2. Start a node.
  3. Add N docs.
  4. Run a range of queries that simulate a production environment.
  5. View the size of the index (on disk) included in the status information (dev) status information (admin) about the Solr core.
  6. Based on the server's system IO cache available, set a maximum index size per server.
  7. Based on the available system memory, set a maximum heap size required per server.
    DataStax recommends the following heap sizes:
    For faster live indexing you should configure live indexing (RT) postings to be allocated offheap.
    Note: Enable live indexing on only one search core per cluster.
  8. Calculate the maximum number of documents per node based on steps 6 and 7.

    When the system is approaching the maximum docs per node, add more nodes.