Using the Spark Jobserver

DSE includes Spark Jobserver, a REST interface for submitting and managing Spark jobs.

DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.

Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:
  • Start the job server:
    dse spark-jobserver start [any_spark_submit_options]
  • Stop the job server:
    dse spark-jobserver stop
The default location of the Spark Jobserver depends on the type of installation:
  • Package installations: /usr/share/dse/spark/spark-jobserver
  • Tarball installations: installation_location/resources/spark/spark-jobserver

All the uploaded JARs, temporary files, and log files are created in the user's $HOME/.spark-jobserver directory, first created when starting Spark Jobserver.

Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data, and faster job starts.

Note:

Running multiple SparkContext instances in a single JVM is not recommended. Therefore it is not recommended to create a new SparkContext for each submitted job in a single Spark Jobserver instance. We recommend one of the two following Spark Jobserver usages.

  • Persistent Context Mode: a single pre-created SparkContext shared by all jobs.
  • Context per JVM: each job has it's own SparkContext in a separate JVM.

    By default, the H2 database is used for storing Spark Jobserver related metadata. In this setup, using Context per JVM requires additional configuration. See the Spark Jobserver docs for details.

    Note: In Context per JVM mode, job results must not contain instances of classes that are not present in the Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by following log line:
    Association with remote system [akka.tcp://JobServer@127.0.0.1:45153] 
    has failed, address is now gated for [5000] ms. 
    Reason: [<unknown type name is placed here>] 

Please consult Spark Jobserver docs to see configuration details.

For an example of how to create and submit an application through the Spark Jobserver, see the spark-jobserver demo included with DSE.

The default location of the demos directory depends on the type of installation:
  • Package installations: /usr/share/dse/demos
  • Tarball installations: installation_location/demos

Enabling SSL communication with Jobserver

To enable SSL encryption when connecting to Jobserver, you must have a server certificate, and a truststore containing the certificate. Add the following configuration section to the dse.conf file in the Spark Jobserver directory.

spray.can.server {
  ssl-encryption = on
  keystore = "path to keystore"
  keystorePW = "keystore password"
}
The default location of the Spark Jobserver depends on the type of installation:
  • Package installations: /usr/share/dse/spark/spark-jobserver
  • Tarball installations: installation_location/resources/spark/spark-jobserver

Restart the Jobserver after saving the configuration changes.