Using DSE Spark with third party tools and integrations
The dse exec command sets the environment variables required to run third-party tools that integrate with Spark.
The dse exec command sets the required environment variables required to run third-party tools that integrate with Spark.
dse exec command
If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a remote cluster.
Jupyter integration
Download and install Jupyter notebook on a DSE node.
dse exec jupyter notebook
A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a context.
Livy integration
Download and install Livy on a DSE node. By default Livy runs Spark in local mode. Before starting Livy create a configuration file by copying the conf/livy.conf.template to conf/livy.conf, then uncomment or add the following two properties:
livy.spark.master = dse:///
livy.repl.enable-hive-context = true
To launch Livy:
dse exec livy-server
RStudio integration
Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run RStudio:
dse exec rstudio
In the RStudio session start a Spark session:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()
Zeppelin integration
Download and install Zeppelin on a DSE node. To launch Zeppelin server:
dse exec zeppelin.sh
By default Zeppelin runs Spark in local mode. Update the master property to
dse:///
in the Spark session in the Interpreters configuration page. No
configuration file changes are required to run Zeppelin.