Enabling Spark apps in cluster mode when authentication is enabled
Configuration steps to enable Spark applications in cluster mode when JAR files are on the Cassandra file system (CFS) and authentication is enabled.
spark-env.sh
The default location of the spark-env.sh file depends on the type of installation:
Package installations |
/etc/dse/spark/spark-env.sh |
Tarball installations |
installation_location/resources/spark/conf/spark-env.sh |
Procedure
-
To enable Spark applications in cluster mode when JAR files are on CFS and
authentication is enabled, do one of the following:
-
Add this statement to the spark-env.sh on every DataStax Enterprise node:
SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.hadoop.cassandra.username=username -Dspark.hadoop.cassandra.password=password"
- Before you start the DataStax Enterprise server process, set the
SPARK_WORKER_OPTS environment variable in a way that guarantees
visibility to DataStax Enterprise server processes.
This environment variable does not need to be passed to applications that are submitted with the
dse spark
ordse spark-submit
commands.
-
-
Follow these best practices:
- Create a unique user with privileges only on CFS (access to related CFS keyspace), and then use the unique user credentials for the Spark Worker authentication. This best practice limits the amount of protected information in the database that is accessible through user Spark Jobs without explicit permission.
- Create a distinct CFS directory and limit the directory access privileges to read only.