Capacity planning

Use a discovery process to develop a plan to ensure sufficient memory resources.

Using DSE Search/Solr is memory-intensive. Solr rereads the entire row when updating indexes, and can impose a significant performance hit on spinning disks. Use solid-state drives (SSD). Using SSDs is critical for applications having very aggressive insert and update requirements.

This section describes a discovery process intended to help you, the DSE Search/Solr administrator, develop a plan for having sufficient memory resources to meet the needs of your users.

Overview 

First, estimate how large your Solr index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the field cache memory usage for heap allocation. Repeat this process using a greater number of documents until you get a feel for the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. The index should be stored on SSDs or should fit in the system IO cache.

Although capacity planning requires a serious effort by operations personnel, the results justify the time investment when planning achieves these results:
  • Optimal heap size per node.
  • Good estimate about the number of nodes needed for your application.

    The replication factor can be increased for more queries per second.

Note: The Pre-flight tool can detect and fix many invalid or suboptimal configuration settings.

Prerequisites

You need to have the following hardware and data:

A node with:
  • GB of RAM, size to be determined during capacity planning
  • SSD or spinning disk
Input data:
  • N documents indexed on a single test node
  • A complete set of sample queries to be executed
  • The total number of documents the system should support

Capacity planning process 

Procedure

  1. Create schema.xml and solrconfig.xml files.
  2. Start a node.
  3. Add N docs.
  4. Run a range of queries that simulate those of users in a production environment.
  5. View the status of examining the field cache memory to discover the memory usage.
  6. View the size of the index (on disk) included in the status information about the Solr core.
  7. Based on the server's system IO cache available, set a maximum index size per-server.
  8. Based on the memory usage, set a maximum heap size required per-server.
  9. Calculate the maximum number of documents per node based on #6 and #7.

    When the system is approaching the maximum docs per node, add more nodes.