Configuring Spark logging options 

Configure Spark logging options.

You can configure Spark logging options for the Spark logs.

Log directories

The location of the logback.xml file depends on the type of installation:
Installer-Services and Package installations /etc/dse/cassandra/logback.xml
Installer-No Services and Tarball installations install_location/resources/cassandra/logback.xml
The Spark logging directory is the directory where the Spark components store individual log files. DataStax Enterprise places logs in the following locations:
Executor logs
  • SPARK_WORKER_DIR/worker-n/application_id/executor_id/stderr
  • SPARK_WORKER_DIR/worker-n/application_id/executor_id/stdout
Spark Master/Worker logs
Spark Master: the global system.log
Spark Worker: SPARK_WORKER_LOG_DIR/worker-n/worker.log

The default SPARK_WORKER_LOG_DIR location is /var/log/spark/worker.

Default log directory for Spark CQL Thrift server 
The default log directory for starting the Spark CQL Thrift server is $HOME/spark-thrift-server.
Spark Shell and application logs
Spark Driver Shell and application logs are output to the console.
Log configuration file
Log configuration files are located in the same directory as spark-env.sh.

Procedure

To configure Spark logging options:

  1. Configure logging options, such as log levels, in the following files:
    Option Description
    Executors logback-spark-executor.xml
    Spark Master logback.xml
    Spark Worker logback-spark-server.xml
    Spark Driver (Spark Shell, Spark applications) logback-spark.xml
  2. If you want to enable rolling logging for Spark executors, add the following options to spark-defaults.conf.

    Enable rolling logging with 3 log files retained before deletion. The log files are broken up by size with a maximum size of 50,000 bytes.

    spark.executor.logs.rolling.maxRetainedFiles 3
    spark.executor.logs.rolling.strategy size
    spark.executor.logs.rolling.maxSize 50000
    The default location of the Spark configuration files depends on the type of installation:
    Installer-Services and Package installations /etc/dse/spark/
    Installer-No Services and Tarball installations install_location/resources/spark/conf/
  3. Configure a safe communication channel to access the Spark user interface.
    Note: When user credentials are specified in plain text on the dse command line, like $ dse -u username -p password, the credentials are present in the logs of Spark workers when the driver is run in cluster mode. The Spark Master, Spark Worker, executor, and driver logs might include sensitive information. Sensitive information includes passwords and digest authentication tokens for Kerberos authentication mode that are passed in the command line or Spark configuration. DataStax recommends using only safe communication channels like VPN and SSH to access the Spark user interface.