Securing Spark

Information about Spark security and steps to configure DataStax Enterprise security for Spark.

Securing Spark within DataStax Enterprise consists of two different security concepts:

Securing communication between Spark nodes 

When the spark_security_enabled configuration option in dse.yaml is set to true, Spark nodes within a data center use a shared secret to secure communication between nodes. Each data center has a different shared secret. The shared secret is stored in the dse_security.spark_security system table.

You should enable client-to-node encryption when securing Spark communication, as the bootstrap process for each client retrieves the Spark security configuration, including the shared secret.

SSL client-to-node encryption

Client-to-node encryption protects data in flight for the Spark Executor to Cassandra connections by establishing a secure channel between the client and the coordinator node. SSL is fully distributed and does not require setting up a shared authentication service. You need to prepare server certificates and enable client-to-node SSL.

Spark internode and client-to-cluster communication can also be encrypted using SSL by enabling it server-side in dse.yaml and client-side in the Spark configuration file spark-defaults.conf. See Spark SSL encryption with SSL for details.

The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml

User authentication 

DataStax Enterprise supports password, LDAP, and authentication in Spark. To use internal authentication, see Running spark-submit job with internal authentication.

For DataStax Enterprise Spark applications and tools, you can setup a set up a .dserc file or use the Spark authentication commands to provide the login credentials. See Launching Spark for information on specifying the authentication credentials.

Kerberos authentication

Kerberos authentication applies to connecting Spark to Cassandra, not authenticating Spark components between each other. The Spark Web UI is not secured and might show the Spark configuration, including delegation token, when using Kerberos.

Kerberos with Spark
With Kerberos authentication, the Spark launcher connects to DSE with Kerberos credentials and requests DSE to generate a delegation token. The Spark driver and executors use the delegation token to connect to the cluster. For valid authentication, the delegation token must be renewed periodically. For security reasons, the user who is authenticated with the token should not be able to renew it. Therefore, delegation tokens have two associated users: token owner and token renewer.

The token renewer is none so that only a DSE internal process can renew it. When the application is submitted, DSE automatically renews delegation tokens that are associated with Spark application. When the application is unregistered (finished), the delegation token renewal is stopped and the token is cancelled.

Set Kerberos options.

If you are using JAAS rather than using a Kerberos ticket, you need to create a JAAS configuration file. The default location for this file is $USER_HOME/.java.login.config. If your JAAS configuration file is in a different location, you must specify the location by setting the java.security.auth.login.config option to the location of the file.

For example, to set java.security.auth.login.config in an environment variable for submitting jobs, set the SPARK_SUBMIT_OPTS environment variable to point to your JAAS configuration file:

export SPARK_SUBMIT_OPTS='-Djava.security.auth.login.config=/path/jaas.config'

To use a JAAS configuration file with Kerberos you must set the kerberos.use.config.file option to true.

You must also set the kerberos.client.reference.name option to DseClient. For example:

dse spark -Dkerberos.use.config.file=true -Dkerberos.client.reference.name=DseClient

Here is an example JAAS configuration file:

DseClient
{
	com.sun.security.auth.module.Krb5LoginModule required
       ...
};

Security limitations 

DataStax Enterprise is limited in securing Spark data:

  • Client-to-node encryption using SSL is supported for Spark Executor to Cassandra connections only.
  • Spark executors run under the same user account as DataStax Enterprise.
  • The Spark Web UI is not secured and might show the Spark configuration, including username, password, or delegation token when Kerberos is used.
  • DataStax Enterprise provides internal authentication support for some Hadoop tools and for connecting Spark to Cassandra, not authenticating Spark components between each other.
DataStax recommends the following security practices:
  • Enable client-to-node encryption using SSL.
  • Expose Spark components to trusted users only.
  • Allow only trusted users to access the file system.

Because Spark executors run under the same user account as DataStax Enterprise, an unapproved user can execute a potentially malicious Spark program that can access the file system on the nodes. System files as well as Cassandra SSTables are vulnerable. Users who cannot access Cassandra files on the node, but who you entrust with your file system, can access temporary directories where RDD fragments are stored temporarily. Having sufficient privileges, a user can also execute malicious system commands. Using password authentication, LDAP, or Kerberos to secure Cassandra makes no sense unless you restrict direct access to the file system.