Accessing HDFS or CFS resources using Kerberos authentication

If you are using Kerberos authentication and need to access HDFS or CFS data from BYOS nodes, follow these steps to configure DSE and Spark.

Procedure

  1. Copy hdfs-site.xml from your Hadoop configuration directory to the DSE Hadoop configuration directory

    scp hdfs-site.xml admin@dsenode:/etc/dse/hadoop2-client/conf/
  2. Pass a comma separate list of HDFS or CFS root directories with the spark.dse.access.namenodes parameter when using DSE Spark commands.

    The spark.dse.access.namenodes parameters have the same effect as spark.yarn.access.namenodes from stand-alone Spark.

    The Spark application must have access to the nodes and Kerberos must be properly configured to be able to access them. They must either be in the same realm or in a trusted realm.

    DSE Spark acquires security tokens for each of the nodes so the Spark application can access those remote HDFS or CFS clusters.

    dse spark --conf spark.dse.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070
  3. Pass a comma separate list of HDFS or CFS root directories with the spark.yarn.access.namenodes parameter when using stand-alone Spark commands.

    spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com