Monitoring Spark with Spark Performance Objects

Performance data is stored in a table to allow you to monitor and tune Spark analytics jobs.

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

The Performance Service can collect data associated with Spark cluster and Spark applications and save it to a table. This allows monitoring the metrics for DSE Analytics applications for performance tuning and bottlenecks.

If authorization is enabled in your cluster, you must grant the user who is running the Spark application SELECT permissions to the dse_system.spark_metrics_config table, and MODIFY permissions to the dse_perf.spark_apps_snapshot.

Monitoring Spark cluster information

The Performance Service stores information about DSE Analytics clusters in the dse_perf.spark_cluster_snapshot table. The cluster performance objects store the available and used resources in the cluster, including cores, memory, and workers, as well as overall information about all registered Spark applications, drivers and executors, including the number of applications, the state of each application, and the host on which the application is running.

To enable collecting Spark cluster information, configure the options in the spark_cluster_info_options section of dse.yaml.

Table 1. Spark cluster info options
Option Default value Description
enabled false Enables or disables Spark cluster information collection.
refresh_rate_ms 10,000 The time in milliseconds in which the data will be collected and stored.

The dse_perf.spark_cluster_snapshot table has the following columns:

name
The cluster name.
active_apps
The number of applications active in the cluster.
active_drivers
The number of active drivers in the cluster.
completed_apps
The number of completed applications in the cluster.
completed_drivers
The number of completed drivers in the cluster.
executors
The number of Spark executors in the cluster.
master_address
The host name and port number of the Spark Master node.
master_recovery_state
The state of the master node.
nodes
The number of nodes in the cluster.
total_cores
The total number of cores available on all the nodes in the cluster.
total_memory_mb
The total amount of memory in megabytes (MB) available to the cluster.
used_cores
The total number of cores currently used by the cluster.
used_memory_mb
The total amount of memory in megabytes (MB) used by the cluster.
workers
The total number of Spark Workers in the cluster.

Monitoring Spark application information

Spark application performance information is stored per application and updated whenever a task is finished. It is stored in the dse_perf.spark_apps_snapshot table.

To enable collecting Spark application information, configure the options in the spark_application_info_options section of dse.yaml.

Table 2. Spark application information options
Option Default Description
enabled false Enables or disables collecting Spark application information.
refresh_rate_ms 10,000 The time in milliseconds in which the data will be collected and stored.

The driver subsection of spark_application_info_options controls the metrics that are collected by the Spark Driver.

Table 3. Spark Driver information options
Option Default Description
sink false Enables or disables collecting metrics from the Spark Driver.
connectorSource false Enables or disables collecting Spark Cassandra Connector metrics.
jvmSource false Enables or disables collecting JVM heap and garbage collection metrics from the Spark Driver.
stateSource false Enables or disables collecting application state metrics.

The executor subsection of spark_application_info_options controls the metrics collected by the Spark executors.

Table 4. Spark executor information options
Option Default Description
sink false Enables or disables collecting Spark executor metrics.
connectorSource false Enables or disables collecting Spark Cassandra Connector metrics from the Spark executors.
jvmSource false Enables or disables collecting JVM heap or garbage collection metrics from the Spark executors.

The dse_perf.spark_apps_snapshot table has the following columns:

application_id
component_id
metric_id
count
metric_type
rate_15_min
rate_1_min
rate_5_min
rate_mean
snapshot_75th_percentile
snapshot_95th_percentile
snapshot_98th_percentile
snapshot_999th_percentile
snapshot_99th_percentile
snapshot_max
snapshot_mean
snapshot_median
snapshot_min
snapshot_stddev
value