Configuring Spark logging options

You can configure Spark logging options for the Spark logs.

Log directories

The Spark logging directory is the directory where the Spark components store individual log files. DataStax Enterprise places logs in the following locations:

Executor logs
  • <SPARK_WORKER_DIR>/worker-<n>/<application_id>/<executor_id>/stderr

  • <SPARK_WORKER_DIR>/worker-<n>/<application_id>/<executor_id>/stdout

Spark Master/Worker logs

Spark Master: the global system.log

Spark Worker: <SPARK_WORKER_LOG_DIR>/worker-<n>/worker.log

The default <SPARK_WORKER_LOG_DIR> location is /var/log/spark/worker.

Default log directory for Spark SQL Thrift server

The default log directory for starting the Spark SQL Thrift server is $HOME/spark-thrift-server.

AlwaysOn SQL server

<ALWAYSON_SQL_LOG_DIR>/service.log

The default <ALWAYSON_SQL_LOG_DIR> location is /var/log/spark/alwayson_sql/.

Spark Shell and application logs

Spark Shell and application logs are output to the console.

SparkR shell log

The default location for the SparkR shell is $HOME/.sparkR.log

Log configuration file

Log configuration files are located in the same directory as spark-env.sh.

Procedure

To configure Spark logging options:

  1. Configure logging options, such as log levels, in the following files:

    Option Description

    Executors

    logback-spark-executor.xml

    Spark Master

    logback.xml

    Spark Worker

    logback-spark-server.xml

    Spark Driver (Spark Shell, Spark applications)

    logback-spark.xml

    SparkR

    logback-sparkR.xml

  2. If you want to enable rolling logging for Spark executors, add the following options to spark-daemon-defaults.conf.

    Enable rolling logging with 3 log files retained before deletion. The log files are broken up by size with a maximum size of 50,000 bytes.

    spark.executor.logs.rolling.maxRetainedFiles 3
    spark.executor.logs.rolling.strategy size
    spark.executor.logs.rolling.maxSize 50000

    The default location of the Spark configuration files depends on the type of installation:

    • Package installations: /etc/dse/spark/

    • Tarball installations: <installation_location>/resources/spark/conf

  3. Configure a safe communication channel to access the Spark user interface.

    When user credentials are specified in plain text on the dse command line, like dse -u <username> -p <password>, the credentials are present in the logs of Spark workers when the driver is run in cluster mode.

    The Spark Master, Spark Worker, executor, and driver logs might include sensitive information. Sensitive information includes passwords and digest authentication tokens for Kerberos guidelines mode that are passed in the command line or Spark configuration. DataStax recommends using only safe communication channels like VPN and SSH to access the Spark user interface.

    You can provide authentication credentials in several ways, see Credentials for authentication.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com