Configuring the Apache Spark™ history server
The Spark history server provides a way to load the event logs from Spark jobs that were run with event logging enabled. The Spark history server works only when files were not flushed before the Spark Master attempted to build a history user interface.
To enable the Spark history server:
Create a directory for event logs in the DSEFS file system:
dse hadoop fs -mkdir /spark $ dse hadoop fs -mkdir /spark/events
On each node in the cluster, edit the
spark-defaults.conffile to enable event logging and specify the directory for event logs:
#Turns on logging for applications submitted from this machine spark.eventLog.dir dsefs:///spark/events spark.eventLog.enabled true #Sets the logging directory for the history server spark.history.fs.logDirectory dsefs:///spark/events # Optional property that changes permissions set to event log files # spark.eventLog.permissions=777
Where is the
The location of the
spark-defaults.conffile depends on the type of installation:
Installation Type Location
Package installations + Installer-Services installations
Tarball installations + Installer-No Services installations
Start the Spark history server on one of the nodes in the cluster:
The Spark history server is a front-end application that displays logging data from all nodes in the Spark cluster. It can be started from any node in the cluster.
If you’ve enabled authentication set the authentication method and credentials in a properties file and pass it to the
dsecommand. For example, for basic authentication:
spark.hadoop.com.datastax.bdp.fs.client.authentication=basic spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username=role name spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password=password
If you set the event log location in
spark-defaults.conf, set the
spark.history.fs.logDirectoryproperty in your properties file.
dse spark-history-server start
With a properties file:
dse spark-history-server start --properties-file properties file
If you specify a properties file, none of the configuration in
spark-defaults.confis used. The properties file should contain all the required configuration properties.
The history server is started and can be viewed by opening a browser to
The Spark Master web UI does not show the historical logs. To work around this known issue, access the history from port 18080.
When event logging is enabled, the default behavior is for all logs to be saved, which causes the storage to grow over time. To enable automated cleanup edit
spark-defaults.confand edit the following options:
spark.history.fs.cleaner.enabled true spark.history.fs.cleaner.interval 1d spark.history.fs.cleaner.maxAge 7d
For these settings, automated cleanup is enabled, the cleanup is performed daily, and logs older than seven days are deleted.