Monitoring Spark with the web interface

A Spark web interface is bundled with DataStax Enterprise. The Spark web interface facilitates monitoring, debugging, and managing Spark.

A web interface, bundled with DataStax Enterprise, facilitates monitoring, debugging, and managing Spark.

Note: When user credentials are specified in plain text on the dse command line, like

dse -u username -p
                            password

, the credentials are present in the logs of Spark workers when the driver is run in cluster mode. Tip: You can provide authentication credentials in several ways, see Credentials for authentication.

The Spark Master, Spark Worker, executor, and driver logs might include sensitive information. Sensitive information includes passwords and digest authentication tokens for Kerberos authentication mode that are passed in the command line or Spark configuration. DataStax recommends using only safe communication channels like VPN and SSH to access the Spark user interface.

Displaying fully qualified domain names in the web UI

To display fully qualified domain names (FQDNs) in the Spark web UI, set the SPARK_PUBLIC_DNS variable in spark-env.sh on each Analytics node.

Set SPARK_PUBLIC_DNS to the FQDN of the node if you have SSL enabled for the web UI.

The default location of the spark-env.sh file depends on the type of installation:

Installer-Services and Package installations	/etc/dse/spark/spark-env.sh
Installer-No Services and Tarball installations	`install_location`/resources/spark/conf/spark-env.sh

Using the Spark web interface

To use the Spark web interface:

Enter the listen IP address of the Spark Master node in a browser followed by port number 7080.
To change the port, modify the spark-env.sh configuration file.

See the Spark documentation for updates on monitoring.

Spark Worker nodes and debugging logs

In the Spark Master node page, click the ID of a worker node, in this example worker-20140314184018-10.168.193.41-41345. The Spark Worker page for the node appears. In this web interface, you see detailed information about apps that are running.
In this example, the Workers section lists three registered nodes. The misleading summary information in the top left corner of the page covers alive and dead workers.
To get debugging information, click the stdout or stderr links in the Logs column.

Application: Spark shell

After starting a Spark context, you can see the status of the worker, which can be useful for debugging. The interface also shows the memory that is required for apps that are running, so you can adjust which apps you run to meet your needs.

Spark Stages: Application progress

To see the progress of applications that are running, click the name of application to see every query that was executed with detailed information about how the data got distributed that might be valuable for debugging.
On a port, not necessarily port 4040 as shown here, you can view Spark stages.
When you run multiple applications at the same time Spark tries to use subsequent ports starting at 4040, for example 4040, 4041, and so on.