Using the Spark Jobserver

DSE includes Spark Jobserver, a REST interface for submitting and managing Spark jobs.

DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the 5.0 Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.

Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:
dse spark-jobserver start [any_spark_submit_options] //Start the job server
$ dse spark-jobserver stop  //Stop the job server
The default location of the Spark Jobserver depends on the type of installation:
Installer-Services and Package installations /usr/share/dse/spark/spark-jobserver
Installer-No Services and Tarball installations install_location/resources/spark/spark-jobserver
Note: For Installer-Services and Package installations, make sure that the OS user which runs the Jobserver has permission to write into the Spark logs directory (/var/log/spark). This could be accomplished by either starting the Jobserver with sudo or by changing the access controls on /var/log/spark. The Jobserver log file is located in /var/log/spark/job-server/spark-job-server.log.

Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data, and faster job starts.

For an example of how to create and submit an application through the Spark Jobserver, see the spark-jobserver demo included with DSE.

If your Spark cluster uses multiple datacenters and one of the datacenters is down, the Jobserver may fail to start. The default read and write consistency levels in hive-site.xml are set to LOCAL_ONE to avoid this problem, but you can modify this setting to any valid consistency level.

    <property>
      <name>cassandra.connection.readCL</name>
      <value>ONE</value>
    </property>

    <property>
      <name>cassandra.connection.writeCL</name>
      <value>QUORUM</value>
    </property>
The default location of the demos directory depends on the type of installation:
Installer-Services and Package installations /usr/share/dse/demos
Installer-No Services and Tarball installations install_location/demos
There are two instances of the hive-site.xml file.

For use with Spark, the default location of the hive-site.xml file is:

Installer-Services and Package installations /etc/dse/spark/hive-site.xml
Installer-No Services and Tarball installations install_location/resources/spark/conf/hive-site.xml

For use with Hive, the default location of the hive-site.xml file is:

Installer-Services and Package installations /etc/dse/hive/hive-site.xml
Installer-No Services and Tarball installations install_location/resources/hive/conf/hive-site.xml