Creating a DSE Analytics Solo datacenter
DSE Analytics Solo datacenters do not store any database or search data, but are strictly used for analytics processing. They are used in conjunction with one or more datacenters that contain database data.
DSE Analytics Solo datacenters do not store any database or search data, but are strictly used for analytics processing. They are used in conjunction with one or more datacenters that contain database data.
Creating a DSE Analytics Solo datacenter within an existing DSE cluster
In this example scenario, there is an existing datacenter, DC1
which has
existing database data. Create a new DSE Analytics Solo datacenter, DC2
,
which does not store any data but will perform analytics jobs using the database data from
DC1
.
- Make sure all keyspaces in the
DC1
datacenter useNetworkTopologyStrategy
. If necessary, alter the keyspace.ALTER KEYSPACE mykeyspace WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3 };
- Add nodes to a new datacenter named
DC2
, then enable Analytics on those nodes. - Configure the
dse_leases
anddse_analytics
keyspaces to replicate to bothDC1
andDC2
. For example:ALTER KEYSPACE dse_leases WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 3 };
- When submitting Spark applications specify the
--master
URL with the name or IP address of a node in theDC2
datacenter, and set thespark.cassandra.connection.local_dc
configuration option toDC1
.
The Spark workers read the data from thedse spark-submit --master "dse://?connection.local_dc=DC2" --class com.datastax.dse.demo.loss.Spark10DayLoss --conf "spark.cassandra.connection.local_dc=DC1" portfolio.jar
DC1
.
Accessing an external DSE transactional cluster from a DSE Analytics Solo cluster
To access an external DSE transactional cluster, explicitly set the connection to the transactional cluster when creating RDDs or Datasets within the application.
In the following examples, the external DSE transactional cluster has a node running on 10.10.0.2.
To create an RDD from the transactional cluster's data:
import com.datastax.spark.connector._ import com.datastax.spark.connector.cql._ import org.apache.spark.SparkContext def analyticsSoloExternalDataExample ( sc: SparkContext) = { val connectorToTransactionalCluster = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "10.10.0.2")) val rddFromTransactionalCluster = { // Sets connectorToTransactionalCluster as default connection for everything in this code block implicit val c = connectorToTransactionalCluster // get the data from the test.words table sc.cassandraTable("test","words") } }
Creating a Dataset from the transactional :
import org.apache.spark.sql.cassandra._ import com.datastax.spark.connector.cql.CassandraConnectorConf // set params for the particular cluster spark.setCassandraConf("TransactionalCluster", CassandraConnectorConf.ConnectionHostParam.option("10.10.0.2")) val df = spark .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test")) .load()
When you submit the application to the DSE Analytics Solo cluster, it will retrieve the data from the external DSE transactional cluster.