Creating a DSE Analytics Solo datacenter

DSE Analytics Solo datacenters do not store any database or search data, but are strictly used for analytics processing. They are used in conjunction with one or more datacenters that contain database data.

DSE Analytics Solo datacenters do not store any database or search data, but are strictly used for analytics processing. They are used in conjunction with one or more datacenters that contain database data.

Creating a DSE Analytics Solo datacenter within an existing DSE cluster

In this example scenario, there is an existing datacenter, DC1 which has existing database data. Create a new DSE Analytics Solo datacenter, DC2, which does not store any data but will perform analytics jobs using the database data from DC1.

  • Make sure all keyspaces in the DC1 datacenter use NetworkTopologyStrategy. If necessary, alter the keyspace.
    ALTER KEYSPACE mykeyspace
    WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3 };
  • Add nodes to a new datacenter named DC2, then enable Analytics on those nodes.
  • Configure the dse_leases and spark_system keyspaces to replicate to both DC1 and DC2. For example:
    ALTER KEYSPACE dse_leases
    WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 3 };
  • When submitting Spark applications specify the --master URL with the name or IP address of a node in the DC2 datacenter, and set the spark.cassandra.connection.local_dc configuration option to DC1.
    dse spark-submit --master "dse://?connection.local_dc=DC2"  --class com.datastax.dse.demo.loss.Spark10DayLoss --conf "spark.cassandra.connection.local_dc=DC1" portfolio.jar
    The Spark workers read the data from the DC1.

Accessing an external DSE transactional cluster from a DSE Analytics Solo cluster

To access an external DSE transactional cluster, explicitly set the connection to the transactional cluster when creating RDDs or Datasets within the application.

In the following examples, the external DSE transactional cluster has a node running on 10.10.0.2.

To create an RDD from the transactional cluster's data:

import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
import org.apache.spark.SparkContext

def analyticsSoloExternalDataExample ( sc: SparkContext) = {
  val connectorToTransactionalCluster = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "10.10.0.2"))

  val rddFromTransactionalCluster = {
    // Sets connectorToTransactionalCluster as default connection for everything in this code block
    implicit val c = connectorToTransactionalCluster
    // get the data from the test.words table
    sc.cassandraTable("test","words")
  }
}

Creating a Dataset from the transactional :

import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql.CassandraConnectorConf

// set params for the particular cluster
spark.setCassandraConf("TransactionalCluster", CassandraConnectorConf.ConnectionHostParam.option("10.10.0.2"))

val df = spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "words", "keyspace" -> "test"))
  .load()

When you submit the application to the DSE Analytics Solo cluster, it will retrieve the data from the external DSE transactional cluster.