Configuration steps to enable Spark applications in cluster mode when JAR files are on
the Cassandra file system (CFS) and authentication is enabled.
You must enable Spark applications in cluster mode when JAR files are on the Cassandra
file system (CFS) and authentication is enabled. When the application is submitted in cluster
mode and the JAR files are on the Cassandra File System (CFS), the Spark Worker process is
responsible for obtaining the required JAR file. When authentication is required, the Spark
Worker process requires the authentication credentials to CFS. The Spark Worker will start
executors for unrelated Spark jobs, so giving the Spark Worker process credentials enables all
future Spark jobs to pull JAR files from CFS for their dependencies. Credentials that are
granted to the Spark Worker must be considered "shared" among all submitted applications,
regardless of the submitting user. Shared credentials do not apply to accessing CFS from the
application code. The default location
of the
spark-env.sh file depends on the type of
installation:
Installer-Services and Package
installations |
/etc/dse/spark/spark-env.sh |
Installer-No Services and Tarball
installations |
install_location/resources/spark/conf/spark-env.sh |
Procedure
-
To enable Spark applications in cluster mode when JAR files are on the Cassandra file
system (CFS) and authentication is enabled, do one of the following:
-
Follow these best practices:
- Create a unique user with privileges only on CFS (access to related CFS keyspace),
and then use the unique user credentials for the Spark Worker authentication. This
best practice limits the amount of protected information in the Cassandra database
that is accessible through user Spark Jobs without explicit permission.
- Create a distinct CFS directory and limit the directory access privileges to read
only.