Using DSE Apache Spark™ with third party tools and integrations
dse exec command sets the required environment variables required to run third-party tools that integrate with Spark.
dse exec command
If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a remote cluster.
Download and install Jupyter notebook on a DSE node.
To launch Jupyter notebook:
dse exec jupyter notebook
A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.
Download and install Livy on a DSE node.
By default Livy runs Spark in local mode.
Before starting Livy, create a configuration file by copying the
Uncomment or add the following two properties:
livy.spark.master = dse:/// livy.repl.enable-hive-context = true
To launch Livy:
dse exec livy-server
install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:
dse exec rstudio
In the RStudio session start a Spark session:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sparkR.session()
These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend JDBC connections rather than SparkR.
Download and install Zeppelin on a DSE node. To launch Zeppelin server:
dse exec zeppelin.sh
By default Zeppelin runs Spark in local mode.
Update the master property to
dse:/// in the Spark session in the Interpreters configuration page.
No configuration file changes are required to run Zeppelin.