Using DSE Spark with third party tools and integrations

The dse exec command sets the environment variables required to run third-party tools that integrate with Spark.

The dse exec command sets the required environment variables required to run third-party tools that integrate with Spark.

dse exec command

If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a remote cluster.

Jupyter integration

Download and install Jupyter notebook on a DSE node.

To launch Jupyter notebook:

dse exec jupyter notebook

A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.

Livy integration

Download and install Livy on a DSE node. By default Livy runs Spark in local mode. Before starting Livy create a configuration file by copying the conf/livy.conf.template to conf/livy.conf, then uncomment or add the following two properties:

livy.spark.master = dse:///
livy.repl.enable-hive-context = true

To launch Livy:

dse exec livy-server

RStudio integration

Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:

dse exec rstudio

In the RStudio session start a Spark session:

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()

Note: These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend using AlwaysOn SQL and JDBC connections rather than SparkR.

Zeppelin integration

Download and install Zeppelin on a DSE node. To launch Zeppelin server:

dse exec zeppelin.sh

By default Zeppelin runs Spark in local mode. Update the master property to dse:/// in the Spark session in the Interpreters configuration page. No configuration file changes are required to run Zeppelin.