DSE Search initial data migration
When you initially load data into a DataStax Enterprise (DSE) database, resource contention requires planning to ensure performance.
-
DSE is performant when writing data.
-
Apache Solr™ is resource intensive when creating a search index.
These two activities compete for resources, so proper resource allocation is critical to maximize efficiency for initial data load.
Recommendations
-
For maximum throughput, store the search index data and DataStax Enterprise (Cassandra) data on separate physical disks.
If you are unable to use separate disks, DataStax recommends that SSDs have a minimum of 500 MB/s read/write speeds (bandwidth).
-
Enable OpsCenter 6.1 repair service.
Initial bulk loading
DataStax recommends following this high-level procedure:
-
Install DSE and configure nodes for search workloads.
-
Use the CQL
CREATE SEARCH INDEX
command to create search indexes. -
Tune the index for maximum indexing throughput.
-
Load data into the database with the index in place. For example, load data with the driver with the consistency level at
LOCAL_ONE
(CL.LOCAL_ONE
) and a sufficiently high write timeout. Use best practices for data loading.Use the DataStax Bulk Loader.
After data loading is completed, there might be lag time because indexing is asynchronous.
-
Verify the indexing
QueueSize
with theIndexPool MBean
. After the index queue size has receded, run this CQL query to verify that the number of records is as expected:SELECT count(*) FROM ks.table WHERE solr_query = '*:*';
The |
New data is automatically indexed.
Troubleshooting
If the record count does not stabilize:
-
If dropped mutations exist in the
nodetool tpstats
output for some nodes, and OpsCenter repair service is not enabled, runmanual repair
on those nodes. -
If dropped mutations do not exist, check the
system.log
and the Solr validation logfor indexing errors.