Spark security

DataStax Enterprise supports Password and LDAP authentication, Kerberos, and client-to-node encryption through SSL security in Spark and Shark. Authentication mechanisms pertain to connecting Spark to Cassandra, not authenticating Spark components between each other.

DataStax Enterprise supports the following types of security in Spark and Shark: All the authentication mechanisms pertain to connecting Spark to Cassandra, not authenticating Spark components between each other.

DataStax Enterprise also supports client-to-node encryption through SSL for all Spark to Cassandra connections. Internal Spark communication is not encrypted as it is not supported by Spark.

Password and LDAP authentication

You can pass Cassandra credentials to Spark by setting the following properties in the Spark configuration object SparkConf before creating Spark Context:
  • cassandra.username
  • cassandra.password

For DataStax Enterprise Spark applications and tools, you can setup a setup a .dserc file or use the Spark and Shark authentication commands to provide the login credentials.

The following examples show how to include Cassandra credentials in your applications:

Example: Passing hard-wired Cassandra credentials

import com.datastax.bdp.spark.DseSparkConfHelper._
import org.apache.spark.{SparkConf, SparkContext}

object AuthenticationExample extends App {

 def createSparkContext() = {
   val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath
  
   val conf = new SparkConf()
     .setAppName("Authentication example")
     .setMaster("local")
     .setJars(Array(myJar))
     .set("cassandra.username", "cassandra")
     .set("cassandra.password", "cassandra")
     .forDse

   new SparkContext(conf)
 }

 val sc = createSparkContext()

 // ...

 sc.stop()
}

Example: Prompting for Cassandra credentials

import com.datastax.bdp.spark.DseSparkConfHelper._
import org.apache.spark.{SparkConf, SparkContext}

object AuthenticationExample extends App {

 def createSparkContext() = {
   /*
     -Dcassandra.username=... and -Dcassandra.password=... arguments will be copied to system properties and removed
     from the args list
    */

   val args = setSystemPropertiesFromArgs(this.args)
   val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath

   val conf = new SparkConf()
     .setAppName("Authentication example")
     .setMaster("local")
     .setJars(Array(myJar))
     .forDse

   new SparkContext(conf)
 }

 val sc = createSparkContext()

 // ...

 sc.stop()
}

You can configure a number of parameters to run your own Spark applications with DataStax Enterprise.

Providing credentials for Cassandra in a Spark application

This procedure describes how to write a Spark application that uses password authentication. The SparkContext is not authenticated. The authentication pertains to connecting Spark to Cassandra, not authenticating Spark components between each other.
  1. Include the instruction in your application to import the DseSparkConfHelper package.
    import com.datastax.bdp.spark.DseSparkConfHelper._
  2. Set authentication properties.
    System.setProperty(“cassandra.username”, xxx)
    System.setProperty(“cassandra.password”, yyy) 
  3. Create a new SparkContext, passing SparkConf.forDSE as an argument. The .forDSE method extends the SparkConf object for DataStax Enterprise.
    new SparkContext(args(0), "PortfolioDemo", 
    new SparkConf().setJars(Array(myJar)).forDse)
    If the ~/.dserc file is not configured, use the DseSparkConfHelper method to find properties in the format Dprop=value and pass them to the System properties automatically. You call setSystemPropertiesFromArgs(args) where args are command line arguments passed to the main method.

Kerberos authentication

Kerberos authentication pertains to connecting Spark to Cassandra, not authenticating Spark components between each other. The Spark Web UI is not secured and might show the Spark configuration, including delegation token when using Kerberos.

SSL

Client-to-node encryption protects data in flight for the Spark Executor to Cassandra connections by establishing a secure channel between the client and the coordinator node. SSL is fully distributed and does not require setting up a shared authentication service. You need to prepare server certificates and enable client-to-node SSL.

Security limitations

DataStax Enterprise is limited in securing Spark data:

  • Client-to-node encryption using SSL is supported for Spark Executor to Cassandra connections only.
  • Spark executors run under the same user account as DataStax Enterprise.
  • The Spark Web UI is not secured and might show the Spark configuration, including username, password, or delegation token when Kerberos is used.
DataStax recommends the following security practices:
  • Expose Spark components to trusted users only.
  • Allow only trusted users to access the file system.

Because Spark executors run under the same user account as DataStax Enterprise, an unapproved user can execute a potentially malicious Spark program that can access the file system on the nodes. System files as well as Cassandra SSTables are vulnerable. Users who cannot access Cassandra files on the node, but who you entrust with your file system, can access temporary directories where RDD fragments are stored temporarily. Having sufficient privileges, a user can also execute malicious system commands. Using password authentication, LDAP, or Kerberos to secure Cassandra makes no sense unless you restrict direct access to the file system.

Note: When user credentials are specified in the dse command line, like dse -u username -p password, the credentials are present in the logs of Spark workers when the driver is run in cluster mode. The Spark Master, Worker, Executor, and Driver logs might include sensitive information. Sensitive information includes passwords and digest authentication tokens for Kerberos authentication mode that are passed in the command line or Spark configuration. DataStax recommends using only safe communication channels like VPN and SSH to access the Spark user interface.