Spark security
DataStax Enterprise supports Password and LDAP authentication, Kerberos, client-to-node encryption through SSL security in Spark and Shark, and Spark SSL encryption.
DataStax Enterprise supports Password and LDAP authentication, Kerberos, client-to-node encryption through SSL security in Spark and Shark, and Spark SSL encryption.
Password and LDAP authentication
- cassandra.username
- cassandra.password
For DataStax Enterprise Spark applications and tools, you can setup a set up a .dserc file or use the Spark and Shark authentication commands to provide the login credentials.
The following examples show how to include Cassandra credentials in your applications:
Example: Passing hard-wired Cassandra credentials
import com.datastax.bdp.spark.DseSparkConfHelper._
import org.apache.spark.{SparkConf, SparkContext}
object AuthenticationExample extends App {
def createSparkContext() = {
val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath
val conf = new SparkConf()
.setAppName("Authentication example")
.setMaster("local")
.setJars(Array(myJar))
.set("cassandra.username", "cassandra")
.set("cassandra.password", "cassandra")
.forDse
new SparkContext(conf)
}
val sc = createSparkContext()
// ...
sc.stop()
}
Example: Prompting for Cassandra credentials
import com.datastax.bdp.spark.DseSparkConfHelper._
import org.apache.spark.{SparkConf, SparkContext}
object AuthenticationExample extends App {
def createSparkContext() = {
/*
-Dcassandra.username=... and -Dcassandra.password=... arguments will be copied to system properties and removed
from the args list
*/
val args = setSystemPropertiesFromArgs(this.args)
val myJar = getClass.getProtectionDomain.getCodeSource.getLocation.getPath
val conf = new SparkConf()
.setAppName("Authentication example")
.setMaster("local")
.setJars(Array(myJar))
.forDse
new SparkContext(conf)
}
val sc = createSparkContext()
// ...
sc.stop()
}
You can configure a number of parameters to run your own Spark applications with DataStax Enterprise.
Providing credentials for Cassandra in a Spark application
- Include the instruction in your application to import the
DseSparkConfHelper package.
import com.datastax.bdp.spark.DseSparkConfHelper._
- Set authentication properties.
System.setProperty(“cassandra.username”, xxx) System.setProperty(“cassandra.password”, yyy)
- Create a new SparkContext, passing
SparkConf.forDSE as an argument. The .forDSE
method extends the SparkConf object for DataStax
Enterprise.
If the ~/.dserc file is not configured, use the DseSparkConfHelper method to find properties in the format Dprop=value and pass them to the System properties automatically. You call setSystemPropertiesFromArgs(args) where args are command line arguments passed to the main method.new SparkContext(args(0), "PortfolioDemo", new SparkConf().setJars(Array(myJar)).forDse)
Kerberos authentication
Kerberos authentication pertains to connecting Spark to Cassandra, not authenticating Spark components between each other. The Spark Web UI is not secured and might show the Spark configuration, including delegation token when using Kerberos.
- Kerberos with Spark
- With Kerberos authentication, the Spark launcher connects to DSE with Kerberos
credentials and requests DSE to generate a delegation token. The Spark driver and
executors use the delegation token to connect to the cluster. For valid authentication,
the delegation token must be renewed periodically. For security reasons, the user who is
authenticated with the token should not be able to renew it. Therefore, delegation
tokens have two associated users: token owner and token renewer.
The token renewer is none so that only a DSE internal process can renew it. When the application is submitted, DSE automatically renews delegation tokens that are associated with Spark application. When the application is unregistered (finished), the delegation token renewal is stopped and the token is cancelled.
Spark to Cassandra SSL encryption
Client-to-node encryption protects data in flight for the Spark Executor to Cassandra connections by establishing a secure channel between the client and the coordinator node. SSL is fully distributed and does not require setting up a shared authentication service. You need to prepare server certificates and enable client-to-node SSL.
Spark SSL encryption
Spark internode and client-to-cluster communication can also be encrypted using SSL by enabling it server-side in dse.yaml and client-side in the Spark configuration file spark-defaults.conf. See Spark SSL encryption for details.
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |
Security limitations
DataStax Enterprise is limited in securing Spark data:
- Client-to-node encryption using SSL is supported for Spark Executor to Cassandra connections only.
- Spark executors run under the same user account as DataStax Enterprise.
- The Spark Web UI is not secured and might show the Spark configuration, including username, password, or delegation token when Kerberos is used.
- DataStax Enterprise provides internal authentication support for some Hadoop tools and for connecting Spark to Cassandra, not authenticating Spark components between each other.
- Expose Spark components to trusted users only.
- Allow only trusted users to access the file system.
Because Spark executors run under the same user account as DataStax Enterprise, an unapproved user can execute a potentially malicious Spark program that can access the file system on the nodes. System files as well as Cassandra SSTables are vulnerable. Users who cannot access Cassandra files on the node, but who you entrust with your file system, can access temporary directories where RDD fragments are stored temporarily. Having sufficient privileges, a user can also execute malicious system commands. Using password authentication, LDAP, or Kerberos to secure Cassandra makes no sense unless you restrict direct access to the file system.