Running Spark Jobs with Kerberos
Spark jobs may be run against a Kerberos enabled DataStax Enterprise database. Defining a Kerberos scheme only connects Spark to a DSE database. It does not authenticate Spark components between each other.
Authenticate using the
kinit command before starting the Spark job.
With Kerberos authentication, the Spark launcher connects to DSE with Kerberos credentials and requests DSE to generate a delegation token. The Spark driver and executors use the delegation token to connect to the cluster.
After the application is finished, the delegation token should be cancelled. It is done automatically when the application is run in client mode. The token is cancelled when the user stops the application. However when applications are deployed in cluster mode, the user needs to cancel the delegation token manually when the application is eventually stopped. DSE displays to the console the exact command to cancel the delegation token when you start the application in cluster mode.
If you are using
JAASrather than using a Kerberos ticket, you need to create a
JAASconfiguration file. The default location for this file is
<$USER_HOME>/.java.login.config. If your
JAASconfiguration file is in a different location, you must specify the location by setting the
java.security.auth.login.configoption to the location of the file.
For example, to set
java.security.auth.login.configin an environment variable for submitting jobs, set the
SPARK_SUBMIT_OPTSenvironment variable to point to your
To use a
JAASconfiguration file with Kerberos you must set the
You must also set the
DseClient. For example:
dse spark -Dkerberos.use.config.file=true -Dkerberos.client.reference.name=DseClient
JAASconfiguration file to use a keytab file or ticket cache.