DSE Search initial data migration
Best practices and guidelines for loading data into DSE Search.
When you initially load data into DataStax Enterprise (DSE) resource contention requires planning to ensure performance.
DSE is performant when writing data.
Apache Solr™ is resource intensive when creating a search index.
These two activities compete for resources, so proper resource allocation is critical to maximize efficiency for initial data load.
DataStax recommends following this high-level procedure:
Install DSE and configure nodes for search workloads.
Use the CQL
CREATE SEARCH INDEXcommand to create search indexes.
Tune the index for maximum indexing throughput.
Load data into the database with the index in place. For example, load data with the driver with the consistency level at
LOCAL_ONE) and a sufficiently high write timeout. Use best practices for data loading.
Use the DataStax Bulk Loader.
After data loading is completed, there might be lag time because indexing is asynchronous.
Verify the indexing
IndexPool MBean. After the index queue size has receded, run this CQL query to verify that the number of records is as expected:
SELECT count(*) FROM ks.table WHERE solr_query = '*:*';
New data is automatically indexed.