Load the event logs from Spark jobs that were run with event logging enabled.
The Spark history server provides a way to load the event logs from Spark jobs that
were run with event logging enabled. The Spark history server works only when files
were not flushed before the Spark Master attempted to build a history user
interface.
spark-defaults.conf
The default
location of the
spark-defaults.conf file
depends on the type of installation:
Package installations Installer-Services installations
|
/etc/dse/spark/spark-defaults.conf |
Tarball installations Installer-No Services installations
|
installation_location/resources/spark/conf/spark-defaults.conf |
Procedure
To enable the Spark history server:
-
Create a directory for event logs in the DSEFS file system:
dse hadoop fs -mkdir /spark
$ dse hadoop fs -mkdir /spark/events
-
On each node in the cluster, edit the
file to enable event
logging and specify the directory for event logs:
#Turns on logging for applications submitted from this machine
spark.eventLog.dir dsefs:///spark/events
spark.eventLog.enabled true
#Sets the logging directory for the history server
spark.history.fs.logDirectory dsefs:///spark/events
# Optional property that changes permissions set to event log files
# spark.eventLog.permissions=777
-
Start the Spark history server on one of the nodes in the cluster:
The Spark history server is a front-end application that displays logging
data from all nodes in the Spark cluster. It can be started from any node in
the cluster.
If you've enabled authentication set the authentication method and
credentials in a properties file and pass it to the dse
command. For example, for basic authentication:
spark.hadoop.com.datastax.bdp.fs.client.authentication=basic
spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username=role name
spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password=password
If you set the event log location in
, set the
spark.history.fs.logDirectory
property in your
properties file.
spark.history.fs.logDirectory=dsefs:///spark/events
dse spark-history-server start
With a properties file:
dse spark-history-server start --properties-file properties file
The history server is started and can be viewed by opening a browser to
http://node hostname:18080
.
Note: The Spark Master web UI does not show the historical logs. To work around
this known issue, access the history from port 18080.
-
When event logging is enabled, the default behavior is for all logs to be
saved, which causes the storage to grow over time. To enable automated cleanup
edit spark-defaults.conf and edit the following options:
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
For these settings, automated cleanup is enabled, the cleanup is performed
daily, and logs older than seven days are deleted.