Capacity planning for DSE Search
Using DSE Search is memory-intensive. Use a discovery process to develop a plan to ensure sufficient memory resources.
Using DSE Search is memory-intensive. Solr rereads the entire row when updating indexes, and can impose a significant performance hit on spinning disks. Use solid-state drives (SSD) for applications that have very aggressive insert and update requirements.
This capacity planning discovery process helps you develop a plan for having sufficient memory resources to meet the operational requirements. For hardware capacity, see Search node capacity.
Overview
First, estimate how large your Solr index will grow by indexing a number of documents on a single node, executing typical user queries, and then examining the field cache memory usage for heap allocation. Repeat this process using a greater number of documents until you get a solid estimate of the size of the index for the maximum number of documents that a single node can handle. You can then determine how many servers to deploy for a cluster and the optimal heap size. Store the index on SSDs or in the system IO cache.
- Optimal heap size per node.
- Estimate of the number of nodes that are required for your application.
- Increase the replication factor to support more queries per second.
- When vnodes are not in use, distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
Prerequisites
- The amount of RAM that is determined during capacity planning
- SSD or a spinning disk with it's own dedicated disk. A dedicated SSD is recommended, but is not required.
- N documents indexed on a single test node
- A complete set of sample queries to be executed
- The maximum number of documents the system will support