Monitoring Spark with Spark Performance Objects

Performance data is stored in a table to allow you to monitor and tune Spark analytics jobs.

dse.yaml

The location of the dse.yaml file depends on the type of installation:

Package installations	/etc/dse/dse.yaml
Tarball installations	`installation_location`/resources/dse/conf/dse.yaml

The Performance Service can collect data associated with Spark cluster and Spark applications and save it to a table. This allows monitoring the metrics for DSE Analytics applications for performance tuning and bottlenecks.

If authorization is enabled in your cluster, you must grant the user who is running the Spark application SELECT permissions to the dse_system.spark_metrics_config table, and MODIFY permissions to the dse_perf.spark_apps_snapshot.

Monitoring Spark cluster information

The Performance Service stores information about DSE Analytics clusters in the dse_perf.spark_cluster_snapshot table. The cluster performance objects store the available and used resources in the cluster, including cores, memory, and workers, as well as overall information about all registered Spark applications, drivers and executors, including the number of applications, the state of each application, and the host on which the application is running.

To enable collecting Spark cluster information, configure the options in the spark_cluster_info_options section of dse.yaml.

Table 1. Spark cluster info options
Option	Default value	Description
enabled	false	Enables or disables Spark cluster information collection.
refresh_rate_ms	10,000	The time in milliseconds in which the data will be collected and stored.

The dse_perf.spark_cluster_snapshot table has the following columns:

name: The cluster name.
active_apps: The number of applications active in the cluster.
active_drivers: The number of active drivers in the cluster.
completed_apps: The number of completed applications in the cluster.
completed_drivers: The number of completed drivers in the cluster.
executors: The number of Spark executors in the cluster.
master_address: The host name and port number of the Spark Master node.
master_recovery_state: The state of the master node.
nodes: The number of nodes in the cluster.
total_cores: The total number of cores available on all the nodes in the cluster.
total_memory_mb: The total amount of memory in megabytes (MB) available to the cluster.
used_cores: The total number of cores currently used by the cluster.
used_memory_mb: The total amount of memory in megabytes (MB) used by the cluster.
workers: The total number of Spark Workers in the cluster.

Monitoring Spark application information

Spark application performance information is stored per application and updated whenever a task is finished. It is stored in the dse_perf.spark_apps_snapshot table.

To enable collecting Spark application information, configure the options in the spark_application_info_options section of dse.yaml.

Table 2. Spark application information options
Option	Default	Description
enabled	false	Enables or disables collecting Spark application information.
refresh_rate_ms	10,000	The time in milliseconds in which the data will be collected and stored.

The driver subsection of spark_application_info_options controls the metrics that are collected by the Spark Driver.

Table 3. Spark Driver information options
Option	Default	Description
sink	false	Enables or disables collecting metrics from the Spark Driver.
connectorSource	false	Enables or disables collecting Spark Cassandra Connector metrics.
jvmSource	false	Enables or disables collecting JVM heap and garbage collection metrics from the Spark Driver.
stateSource	false	Enables or disables collecting application state metrics.

The executor subsection of spark_application_info_options controls the metrics collected by the Spark executors.

Table 4. Spark executor information options
Option	Default	Description
sink	false	Enables or disables collecting Spark executor metrics.
connectorSource	false	Enables or disables collecting Spark Cassandra Connector metrics from the Spark executors.
jvmSource	false	Enables or disables collecting JVM heap or garbage collection metrics from the Spark executors.

The dse_perf.spark_apps_snapshot table has the following columns:

application_id
component_id
metric_id
count
metric_type
rate_15_min
rate_1_min
rate_5_min
rate_mean
snapshot_75th_percentile
snapshot_95th_percentile
snapshot_98th_percentile
snapshot_999th_percentile
snapshot_99th_percentile
snapshot_max
snapshot_mean
snapshot_median
snapshot_min
snapshot_stddev
value