Using the Apache Spark™ Jobserver

DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.

Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:

  • Start the job server:

    dse spark-jobserver start [any_spark_submit_options]
  • Stop the job server:

    dse spark-jobserver stop

The default location of the Spark Jobserver depends on the type of installation:

  • Package installations and Installer-Services: /usr/share/dse/spark/spark-jobserver

  • Tarball installations and Installer-No Services: <installation_location>/resources/spark/spark-jobserver

All the uploaded JARs, temporary files, and log files are created in the user’s $HOME/.spark-jobserver directory, first created when starting Spark Jobserver.

Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data, and faster job starts.

Running multiple SparkContext instances in a single JVM is not recommended. Therefore it is not recommended to create a new SparkContext for each submitted job in a single Spark Jobserver instance. We recommend one of the two following Spark Jobserver usages.

  • Persistent Context Mode: a single pre-created SparkContext shared by all jobs.

  • Context per JVM: each job has it’s own SparkContext in a separate JVM. See the Spark Jobserver docs for details.

    In Context per JVM mode, job results must not contain instances of classes that are not present in the Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by following log line:

    Association with remote system [akka.tcp://JobServer@127.0.0.1:45153]
    has failed, address is now gated for [5000] ms.
    Reason: [<unknown type name is placed here>]

Please consult Spark Jobserver docs to see configuration details.

For an example of how to create and submit an application through the Spark Jobserver, see the spark-jobserver demo included with DSE.

The default location of the demos directory depends on the type of installation:

  • Package installations: /usr/share/dse/demos

  • Tarball installations: <installation_location>/demos

Enabling SSL communication with Jobserver

To enable SSL encryption when connecting to Jobserver, you must have a server certificate, and a truststore containing the certificate. Add the following configuration section to the dse.conf file in the Spark Jobserver directory.

spray.can.server {
  ssl-encryption = on
  keystore = "path to keystore"
  keystorePW = "keystore password"
}

Restart the Jobserver after saving the configuration changes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com