Using the Spark Jobserver
DSE includes Spark Jobserver, a REST interface for submitting and managing Spark jobs.
DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.
- Start the job
server:
dse spark-jobserver start [any_spark_submit_options]
- Stop the job
server:
dse spark-jobserver stop
- Package installations: /usr/share/dse/spark/spark-jobserver
- Tarball installations: installation_location/resources/spark/spark-jobserver
All the uploaded JARs, temporary files, and log files are created in the user's $HOME/.spark-jobserver directory, first created when starting Spark Jobserver.
Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data, and faster job starts.
Running multiple SparkContext
instances in a single JVM is not recommended.
Therefore it is not recommended to create a new SparkContext
for each submitted
job in a single Spark Jobserver instance. We recommend one of the two following Spark Jobserver
usages.
- Persistent Context Mode: a single pre-created
SparkContext
shared by all jobs. - Context per JVM: each job has it's own
SparkContext
in a separate JVM.By default, the H2 database is used for storing Spark Jobserver related metadata. In this setup, using Context per JVM requires additional configuration. See the Spark Jobserver docs for details.
Note: In Context per JVM mode, job results must not contain instances of classes that are not present in the Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by following log line:Association with remote system [akka.tcp://JobServer@127.0.0.1:45153] has failed, address is now gated for [5000] ms. Reason: [<unknown type name is placed here>]
Please consult Spark Jobserver docs to see configuration details.
For an example of how to create and submit an application through the Spark Jobserver, see the
spark-jobserver
demo included with DSE.
- Package installations: /usr/share/dse/demos
- Tarball installations: installation_location/demos
Enabling SSL communication with Jobserver
To enable SSL encryption when connecting to Jobserver, you must have a server certificate, and a truststore containing the certificate. Add the following configuration section to the dse.conf file in the Spark Jobserver directory.
spray.can.server {
ssl-encryption = on
keystore = "path to keystore"
keystorePW = "keystore password"
}
- Package installations: /usr/share/dse/spark/spark-jobserver
- Tarball installations: installation_location/resources/spark/spark-jobserver
Restart the Jobserver after saving the configuration changes.