DSE Analytics and Search integration

DSE SearchAnalytics clusters can use DSE Search queries within DSE Analytics jobs.

An integrated DSE SearchAnalytics cluster allows analytics jobs to be performed using search queries. This integration allows finer-grained control over the types of queries that are used in analytics workloads, and improves performance by reducing the amount of data that is processed.

Nodes started in SearchAnalytics mode allow you to create analytics queries that use DSE Search indexes. These queries return RDDs that are used by Spark jobs to analyze the returned data.

The following code shows how to use a DSE Search query from the DSE Spark console.

val table = sc.cassandraTable("music","solr")
val result = table.select("id","artist_name").where("solr_query='artist_name:Miles*'").collect

For a detailed example, see Running the Wikipedia demo with SearchAnalytics.

Planning and configuring a DSE SearchAnalytics cluster 

  1. Create DSE SearchAnalytics clusters as new clusters in a datacenter, as described in Single datacenter deployment per workload type.

    The name of the datacenter is set to SearchAnalytics when using the DseSimpleSnitch. Do not modify existing search or analytics nodes to be SearchAnalytics nodes.

  2. Set the cql_solr_query_paging: driver option in the dse.yaml file. For SearchAnalytics nodes, you must use the cql_solr_query_paging: driver setting to make Solr queries from Spark.
  3. Perform load testing to ensure your hardware has enough CPU and memory for the additional resource overhead that is required by Spark and Solr.

    SearchAnalytics nodes might consume more resources than search or analytics nodes. Resource requirements of the nodes greatly depend on the type of query patterns you are using.

The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml

Considerations for DSE SearchAnalytics clusters 

Care should be taken when enabling both Search and Analytics on a DSE node. Since both workloads will be enabled in addition to Cassandra, users should ensure proper resources have been provisioned for these simultaneous workloads. This includes sufficient memory and compute resources to accommodate the specific indexing, query, and processing appropriate to the use case.

SearchAnalytics clusters are appropriate for production environments, provided these environments provide sufficient resources for the specific workload, as is the case for all DSE clusters.