Monitoring Spark with the web interface 

A Spark web interface is bundled with DataStax Enterprise. The Spark web interface facilitates monitoring, debugging, and managing Spark.

A web interface, bundled with DataStax Enterprise, facilitates monitoring, debugging, and managing Spark.

Note: When user credentials are specified in plain text on the dse command line, like $ dse -u username -p password, the credentials are present in the logs of Spark workers when the driver is run in cluster mode. The Spark Master, Spark Worker, executor, and driver logs might include sensitive information. Sensitive information includes passwords and digest authentication tokens for Kerberos authentication mode that are passed in the command line or Spark configuration. DataStax recommends using only safe communication channels like VPN and SSH to access the Spark user interface.

Using the Spark web interface

To use the Spark web interface:
  • Enter the public IP address of the Spark Master node in a browser followed by port number 7080.
  • To change the port, modify the configuration file.

See the Spark documentation for updates on monitoring.

Spark Worker nodes and debugging logs 

  • In the Spark Master node page, click the ID of a worker node, in this example worker-20140314184018- The Spark Worker page for the node appears. In this web interface, you see detailed information about apps that are running.

    In this example, the Workers section lists three registered nodes. The misleading summary information in the top left corner of the page covers alive and dead workers.

  • To get debugging information, click the stdout or stderr links in the Logs column.

Application: Spark shell

After starting a Spark context, you can see the status of the worker, which can be useful for debugging. The interface also shows the memory that is required for apps that are running, so you can adjust which apps you run to meet your needs.

Spark Stages: Application progress

  • To see the progress of applications that are running, click the name of application to see every query that was executed with detailed information about how the data got distributed that might be valuable for debugging.
  • On a port, not necessarily port 4040 as shown here, you can view Spark stages.

    When you run multiple applications at the same time Spark tries to use subsequent ports starting at 4040, for example 4040, 4041, and so on.