Running Spark jobs with Kerberos
Spark jobs may be run against a Kerberos enabled DataStax Enterprise database.
Spark jobs may be run against a Kerberos enabled DataStax Enterprise database. Defining a Kerberos scheme only connects Spark to DSE database. It does not authenticate Spark components between each other.
Authenticate using the kinit
command before starting the Spark
job.
After the application is finished, the delegation token should be cancelled. It is done automatically when the application is run in client mode. The token is cancelled when the user stops the application. However when applications are deployed in cluster mode, the user needs to cancel the delegation token manually when the application is eventually stopped. DSE will output to the console the exact command to cancel the delegation token when you start the application in cluster mode.
Procedure
-
If you are using JAAS rather than using a Kerberos ticket, you need to create a
JAAS configuration file. The default location for this file is
$USER_HOME/.java.login.config
. If your JAAS configuration file is in a different location, you must specify the location by setting thejava.security.auth.login.config
option to the location of the file.For example, to set
java.security.auth.login.config
in an environment variable for submitting jobs, set theSPARK_SUBMIT_OPTS
environment variable to point to your JAAS configuration file:export SPARK_SUBMIT_OPTS='-Djava.security.auth.login.config=/path/jaas.config'
-
To use a JAAS configuration file with Kerberos you must set the
kerberos.use.config.file
option totrue
. -
You must also set the
kerberos.client.reference.name
option toDseClient
. For example:dse spark -Dkerberos.use.config.file=true -Dkerberos.client.reference.name=DseClient