Using the Apache Spark™ Jobserver
DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.
Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:
-
Start the job server:
dse spark-jobserver start [any_spark_submit_options]
-
Stop the job server:
dse spark-jobserver stop
The default location of the Spark Jobserver depends on the type of installation:
-
Package installations and Installer-Services:
/usr/share/dse/spark/spark-jobserver
-
Tarball installations and Installer-No Services:
<installation_location>/resources/spark/spark-jobserver
All the uploaded JARs, temporary files, and log files are created in the user’s $HOME/.spark-jobserver
directory, first created when starting Spark Jobserver.
Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data, and faster job starts.
Running multiple |
-
Persistent Context Mode: a single pre-created
SparkContext
shared by all jobs. -
Context per JVM: each job has it’s own
SparkContext
in a separate JVM. See the Spark Jobserver docs for details.In Context per JVM mode, job results must not contain instances of classes that are not present in the Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by following log line:
Association with remote system [akka.tcp://JobServer@127.0.0.1:45153] has failed, address is now gated for [5000] ms. Reason: [<unknown type name is placed here>]
Please consult Spark Jobserver docs to see configuration details.
For an example of how to create and submit an application through the Spark Jobserver, see the spark-jobserver
demo included with DSE.
The default location of the demos directory depends on the type of installation:
-
Package installations:
/usr/share/dse/demos
-
Tarball installations:
<installation_location>/demos
Enabling SSL communication with Jobserver
To enable SSL encryption when connecting to Jobserver, you must have a server certificate, and a truststore containing the certificate.
Add the following configuration section to the dse.conf
file in the Spark Jobserver directory.
spray.can.server {
ssl-encryption = on
keystore = "path to keystore"
keystorePW = "keystore password"
}
Restart the Jobserver after saving the configuration changes.