DSE Analytics Security Checklist
DataStax recommends the following security practices:
-
Enable client-to-node encryption using SSL.
-
Run Spark ports for internode communications from within a secured network that has no exposure to outside traffic.
Secure DataStax Enterprise Analytics nodes as follows:
-
Authentication:
-
Distinct secrets for internode and per application, see Configuring Spark nodes.
-
Native authentication for users of each application executor (run as) and isolation of related data, see Configuring Spark nodes.
-
Spark UI internal or LDAP authentication, see Monitoring Spark with the web interface.
-
User authentication for Spark jobs. DataStax Enterprise supports internal, LDAP, and Kerberos authentication for Spark.
-
Internal and LDAP: For DataStax Enterprise Spark applications and tools, use the Spark authentication commands to provide the authentication credentials, see Running spark-submit job with internal authentication.
-
Kerberos: Defining a Kerberos scheme applies to connecting Spark to DSE database, not authenticating Spark components between each other. The Spark Web UI is not secured, so some parameters passed to the executor in the command line might be visible. However, the DSE username, password, and delegation token are hidden. By default, when Kerberos is the only authentication scheme, the Spark UI is inaccessible, so UI authorization must be disabled.
-
-
-
Authorization:
Data pulled from the database for Spark jobs and access control for Spark application submissions is protected by role-based access control (RBAC). The user running the request must have permission to access the data through their role assignment.
No authorization for the Spark UI master and workers is available.
-
Auditing:
-
Analytic operations performed in Spark are recorded to the Spark Event log. To enable, see Configuring Spark logging options.
-
CQL requests are recorded in the database logs, see Setting up database auditing.
-
-
Transparent Data Encryption (TDE):
TDE
applies only to data stored in the database. DSE does not support encrypting data that is used by Spark and stored inDSEFS
or local temporary directories. -
Encrypt data in-flight using SSL, TLS, or SASL:
SSL/TLS: Client-to-node encryption protects data in flight for the Spark Executor to DSE database connections by establishing a secure channel between the client and the coordinator node. SSL does not require setting up a shared authentication service. You need to prepare server certificates and enable client-to-node SSL.
SASL: Spark internode and client-to-cluster communication can be encrypted using the SASL Digest-MD5 mechanism for mutual authentication and encryption. SASL encryption is also available for communicating among Spark driver, Spark executors, and the external shuffle service (
ExternalShuffleService
). See Securing Spark connections for details.