Capacity planning

Using DSE Search is memory-intensive. Use a discovery process to develop a capacity plan to ensure sufficient memory resources.

Using DSE Search is memory-intensive. Solr rereads the entire row when updating indexes, and can impose a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.

This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements. For general advice on capacity planning, see Planning and testing cluster deployments.

Overview

First, estimate how large your search index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the field cache memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.

Capacity planning requires a significant effort by operations personnel to:

Set the optimal heap size per node.
Estimate of the number of nodes that are required for your application.
Increase the replication factor to support more queries per second.
Distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.

Note: The Preflight tool can detect and fix many invalid or suboptimal configuration settings.

Prerequisites

A node with:

The amount of RAM that is determined during capacity planning.
SSD or a spinning disk with it's own dedicated disk. A dedicated SSD is recommended, but is not required.

Input data:

N documents indexed on a single test node
A complete set of sample queries to be executed
The maximum number of documents the system will support

Procedure

Create the schema.xml and solrconfig.xml files.
Start a node.
Add N docs.
Run a range of queries that simulate a production environment.
View the status of the field cache memory to discover the memory usage.
View the size of the index (on disk) included in the status information about the Solr core.
Based on the server's system IO cache available, set a maximum index size per server.
Based on the memory usage, set a maximum heap size required per server.
- For JVM memory to provide the required performance and memory capacity, DataStax recommends a heap size of 14 GB or larger.
- For faster live indexing, see Configuring and tuning indexing performance (RT) postings to be allocated offheap.
Calculate the maximum number of documents per node based on steps 6 and 7.

When the system is approaching the maximum docs per node, add more nodes.