Capacity planning for DSE Search

Using a discovery process to develop a DSE Search capacity plan to ensure sufficient memory resources.

Using DSE Search is memory-intensive and rereads the entire row when updating indexes which can cause a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.

This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements.


First, estimate how large your search index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.

Capacity planning requires a significant effort by operations personnel to:
  • Set the optimal heap size per node.
  • Estimate of the number of nodes that are required for your application.
  • Increase the replication factor to support more queries per second.
  • Distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
Note: The Preflight check tool can detect and fix many invalid or suboptimal configuration settings.


  • DataStax recommends the following maximum index sizes:
    • Single index: 250 GB maximum

      Once a single index exceeds 250 GB or if performance degrades, consider adding nodes to further distribute the search index.

    • Multiple indexes: collective index size: 500 GB maximum

      Supporting more than one index is hardware dependent with respect to the number of physical CPU cores available. DataStax recommends a minimum of two physical cores per search index where the maximum number of search indexes is the number of physical cores divided by two.

      For example, if a machine has 16 virtual CPUs on 8 physical cores, the recommended maximum number of search indexes is 4.

  • Set the location of search indexes.
  • Perform extensive testing or consult the DataStax Services team.


A node with:
  • The amount of RAM that is determined during capacity planning.
  • DataStax recommends the following:
    • One data/logs disk. (If using spinning disks, a separate disk for the commit log. See Disk space.)
    • Use a dedicated drive for search indexes.
Input data:
  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The maximum number of documents the system will support


The capacity planning discovery steps:
  1. Create the schema.xml and solrconfig.xml files.
  2. Start a node.
  3. Add N docs.
  4. Run a range of queries that simulate a production environment.
  5. View the size of the index (on disk) included in the status information about the Solr core.
  6. Based on the server's system IO cache available, set a maximum index size per server.
  7. Based on the available system memory, set a maximum heap size required per server.
    DataStax recommends the following heap sizes:
    • System memory less than 64 GB: 24 GB
    • System memory greater than 64 GB: 30 GB
    For faster live indexing you should configure live indexing ( (RT) postings to be allocated offheap.
    Note: Enable live indexing on only one search core per cluster.
  8. Calculate the maximum number of documents per node based on 6 and 7.

    When the system is approaching the maximum docs per node, add more nodes.