Use DSE Spark with third party tools and integrations

The dse exec command sets the required environment variables required to run third-party tools that integrate with Apache Spark™.

dse exec <command>

If the tool is run on a server that is not part of the DSE cluster, see Running Apache Spark commands against a remote cluster.

Jupyter integration

Download and install Jupyter notebook on a DSE node.

To launch Jupyter notebook:

dse exec jupyter notebook

A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.

Livy integration

Download and install Livy on a DSE node. By default Livy runs Apache Spark in local mode. Before starting Livy create a configuration file by copying the conf/livy.conf.template to conf/livy.conf, then uncomment or add the following two properties:

livy.spark.master = dse:///
livy.repl.enable-hive-context = true

To launch Livy:

dse exec livy-server

RStudio integration

Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:

dse exec rstudio

In the RStudio session start a Spark session:

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()

These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend using AlwaysOn SQL and JDBC connections rather than SparkR.

Zeppelin integration

Download and install Zeppelin on a DSE node. To launch Zeppelin server:

dse exec zeppelin.sh

By default Zeppelin runs Apache Spark in local mode. Update the master property to dse:/// in the Spark session in the Interpreters configuration page. No configuration file changes are required to run Zeppelin.

Use DSE Spark with third party tools and integrations

Jupyter integration

Livy integration

RStudio integration

Zeppelin integration

Was this helpful?

Give Feedback