Using DSE Apache Spark™ with third party tools and integrations
The dse exec
command sets the required environment variables required to run third-party tools that integrate with Spark.
dse exec command
The |
If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a remote cluster.
Jupyter integration
Download and install Jupyter notebook on a DSE node.
To launch Jupyter notebook:
dse exec jupyter notebook
A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.
Livy integration
Download and install Livy on a DSE node.
By default Livy runs Spark in local mode.
Before starting Livy, create a configuration file by copying the conf/livy.conf.template
to conf/livy.conf
.
Uncomment or add the following two properties:
livy.spark.master = dse:///
livy.repl.enable-hive-context = true
To launch Livy:
dse exec livy-server
RStudio integration
Download and install R
on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:
dse exec rstudio
In the RStudio session start a Spark session:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()
These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend JDBC connections rather than SparkR. |
Zeppelin integration
Download and install Zeppelin on a DSE node. To launch Zeppelin server:
dse exec zeppelin.sh
By default Zeppelin runs Spark in local mode.
Update the master property to dse:///
in the Spark session in the Interpreters configuration page.
No configuration file changes are required to run Zeppelin.