DSE Analytics security checklist

DataStax recommends the following security practices:

  • Enable client-to-node encryption using SSL.

  • Spark ports for internode communications should run within a secured network without exposure to outside traffic.

Secure DataStax Enterprise Analytics nodes as follows:

  • Authentication:

    • Distinct secrets for internode and per application, see Configuring Spark nodes.

    • Native authentication for users of each application executor (run as) and isolation of related data, see Configuring Spark nodes.

    • Spark UI internal or LDAP authentication, see Monitoring Spark with the web interface.

    • User authentication for Spark jobs. DataStax Enterprise supports internal, LDAP, and Kerberos authentication for Spark.

      • Internal and LDAP: For DataStax Enterprise Spark applications and tools, use the Spark authentication commands to provide the authentication credentials, see Running spark-submit job with internal authentication.

      • Kerberos: Defining a Kerberos scheme applies to connecting Spark to DSE database, not authenticating Spark components between each other. The Spark Web UI is not secured, so some parameters passed to the executor in the command line might be visible. However, the DSE username, password, and delegation token are hidden. By default, when Kerberos is the only authentication scheme, the Spark UI is inaccessible, so UI authorization must be disabled.

  • Authorization:

    Data pulled from the database for Spark jobs and access control for Spark application submissions is protected by role-based access control (RBAC). The user running the request must have permission to access the data through their role assignment.

    No authorization for the Spark UI master and workers is available.

  • Auditing:

  • Transparent Data Encryption (TDE):

    TDE applies only to data stored in the database. DSE does not support encrypting data that is used by Spark and stored in DSEFS or local temporary directories.

  • Encrypt data in-flight using SSL, TLS, or SASL:

    SSL/TLS: Client-to-node encryption protects data in flight for the Spark Executor to DSE database connections by establishing a secure channel between the client and the coordinator node. SSL does not require setting up a shared authentication service. You need to prepare server certificates and enable client-to-node SSL.

    SASL: Spark internode and client-to-cluster communication can be encrypted using the SASL Digest-MD5 mechanism for mutual authentication and encryption. SASL encryption is also available for communicating among Spark driver, Spark executors, and the external shuffle service (ExternalShuffleService). See Securing Spark connections for details.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com