Using DSE Apache Spark™ with third party tools and integrations

The dse exec command sets the required environment variables required to run third-party tools that integrate with Spark.

dse exec command

The dse exec command was introduced in DSE 5.1.6.

If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a remote cluster.

Jupyter integration

Download and install Jupyter notebook on a DSE node.

To launch Jupyter notebook:

dse exec jupyter notebook

A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.

Livy integration

Download and install Livy on a DSE node. By default Livy runs Spark in local mode. Before starting Livy, create a configuration file by copying the conf/livy.conf.template to conf/livy.conf. Uncomment or add the following two properties:

livy.spark.master = dse:///
livy.repl.enable-hive-context = true

To launch Livy:

dse exec livy-server

RStudio integration

Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:

dse exec rstudio

In the RStudio session start a Spark session:

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()

These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend JDBC connections rather than SparkR.

Zeppelin integration

Download and install Zeppelin on a DSE node. To launch Zeppelin server:

dse exec zeppelin.sh

By default Zeppelin runs Spark in local mode. Update the master property to dse:/// in the Spark session in the Interpreters configuration page. No configuration file changes are required to run Zeppelin.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com