Monitoring Spark with Spark Performance Objects
Performance data is stored in a table to allow you to monitor and tune Spark analytics jobs.
Where is the dse.yaml
file?
The location of the dse.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
The Performance Service can collect data associated with Spark cluster and Spark applications and save it to a table. This allows monitoring the metrics for DSE Analytics applications for performance tuning and bottlenecks.
If authorization is enabled in your cluster, you must grant the user who is running the Spark application SELECT
permissions to the dse_system.spark_metrics_config
table, and MODIFY
permissions to the dse_perf.spark_apps_snapshot
.
Monitoring Spark cluster information
The Performance Service stores information about DSE Analytics clusters in the dse_perf.spark_cluster_snapshot
table.
The cluster performance objects store the available and used resources in the cluster, including cores, memory, and workers, as well as overall information about all registered Spark applications, drivers and executors, including the number of applications, the state of each application, and the host on which the application is running.
To enable collecting Spark cluster information, configure the options in the spark_cluster_info_options
section of dse.yaml
.
Option | Default value | Description |
---|---|---|
enabled |
false |
Enables or disables Spark cluster information collection. |
|
10,000 |
The time in milliseconds in which the data is collected and stored. |
The dse_perf.spark_cluster_snapshot
table has the following columns:
- name
-
The cluster name.
active_apps
-
The number of applications active in the cluster.
active_drivers
-
The number of active drivers in the cluster.
completed_apps
-
The number of completed applications in the cluster.
completed_drivers
-
The number of completed drivers in the cluster.
- executors
-
The number of Spark executors in the cluster.
master_address
-
The host name and port number of the Spark Master node.
master_recovery_state
-
The state of the master node.
- nodes
-
The number of nodes in the cluster.
total_cores
-
The total number of cores available on all the nodes in the cluster.
total_memory_mb
-
The total amount of memory in megabytes (MB) available to the cluster.
used_cores
-
The total number of cores currently used by the cluster.
used_memory_mb
-
The total amount of memory in megabytes (MB) used by the cluster.
- workers
-
The total number of Spark Workers in the cluster.
Monitoring Spark application information
Spark application performance information is stored per application and updated whenever a task is finished.
It is stored in the dse_perf.spark_apps_snapshot
table.
To enable collecting Spark application information, configure the options in the spark_application_info_options
section of dse.yaml
.
Option | Default | Description |
---|---|---|
enabled |
false |
Enables or disables collecting Spark application information. |
|
10,000 |
The time in milliseconds in which the data will be collected and stored. |
The driver subsection of spark_application_info_options
controls the metrics that are collected by the Spark Driver.
Option | Default | Description |
---|---|---|
sink |
false |
Enables or disables collecting metrics from the Spark Driver. |
|
false |
Enables or disables collecting Spark Cassandra Connector metrics. |
|
false |
Enables or disables collecting JVM heap and garbage collection metrics from the Spark Driver. |
|
false |
Enables or disables collecting application state metrics. |
The executor subsection of spark_application_info_options
controls the metrics collected by the Spark executors.
Option | Default | Description |
---|---|---|
sink |
false |
Enables or disables collecting Spark executor metrics. |
|
false |
Enables or disables collecting Spark Cassandra Connector metrics from the Spark executors. |
|
false |
Enables or disables collecting JVM heap or garbage collection metrics from the Spark executors. |
The dse_perf.spark_apps_snapshot
table has the following columns:
-
application_id
-
component_id
-
metric_id
-
count
-
metric_type
-
rate_15_min
-
rate_1_min
-
rate_5_min
-
rate_mean
-
snapshot_75th_percentile
-
snapshot_95th_percentile
-
snapshot_98th_percentile
-
snapshot_999th_percentile
-
snapshot_99th_percentile
-
snapshot_max
-
snapshot_mean
-
snapshot_median
-
snapshot_min
-
snapshot_stddev
-
value