Spark security
DataStax Enterprise 4.5 supports password authentication in Spark and Shark. The authentication pertains to connecting Spark to Cassandra, not authenticating Spark components between each other.
- cassandra.username
- cassandra.password
The Spark Web UI is not secured and might show the Spark configuration, including username, password, or delegation token when Kerberos is used.
For DataStax Enterprise Spark applications and tools, you can setup a setup a .dserc file or use the Spark and Shark authentication commands to provide the login credentials.
The following examples show how to include Cassandra credentials in your applications:
Example: Passing hard-wired Cassandra credentials
import com.datastax.bdp.spark.DseSparkConfHelper._ import org.apache.spark.{SparkConf, SparkContext} object AuthenticationExample extends App { def createSparkContext() = { val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath val conf = new SparkConf() .setAppName("Authentication example") .setMaster("local") .setJars(Array(myJar)) .set("cassandra.username", "cassandra") .set("cassandra.password", "cassandra") .forDse new SparkContext(conf) } val sc = createSparkContext() // ... sc.stop() }
Example: Prompting for Cassandra credentials
import com.datastax.bdp.spark.DseSparkConfHelper._ import org.apache.spark.{SparkConf, SparkContext} object AuthenticationExample extends App { def createSparkContext() = { /* -Dcassandra.username=... and -Dcassandra.password=... arguments will be copied to system properties and removed from the args list */ val args = setSystemPropertiesFromArgs(this.args) val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath val conf = new SparkConf() .setAppName("Authentication example") .setMaster("local") .setJars(Array(myJar)) .forDse new SparkContext(conf) } val sc = createSparkContext() // ... sc.stop() }
You can configure a number of parameters to run your own Spark applications with DataStax Enterprise.
Security limitations
DataStax Enterprise 4.5 is limited in securing Spark data:
- No Kerberos support
- No SSL support
- Spark executors run under the same user account as DataStax Enterprise.
- The Spark Web UI is not secured and might show the Spark configuration,
including username, password.
DataStax recommends the following security practices:
- Expose Spark components to trusted users only.
- Allow only trusted users to access the file system.
Because Spark executors run under the same user account as DataStax Enterprise, an unapproved user can execute a potentially malicious Spark program that can access the file system on the nodes. System files as well as Cassandra SSTables are vulnerable. Users who cannot access Cassandra files on the node, but who you entrust with your file system, can access temporary directories where RDD fragments are stored temporarily. Having sufficient privileges, a user can also execute malicious system commands. Using password authentication to secure Cassandra makes no sense unless you restrict direct access to the file system.
Providing credentials for Cassandra in a Spark application
This procedure describes how to write a Spark application that uses password authentication. The SparkContext is not authenticated. The authentication pertains to connecting Spark to Cassandra, not authenticating Spark components between each other.