Accessing HDFS or CFS resources using Kerberos authentication
If you are using Kerberos authentication and need to access HDFS or CFS data from BYOS nodes, follow these steps to configure DSE and Spark.
-
Copy
hdfs-site.xmlfrom your Hadoop configuration directory to the DSE Hadoop configuration directoryscp hdfs-site.xml admin@dsenode:/etc/dse/hadoop2-client/conf/ -
Pass a comma separate list of HDFS or CFS root directories with the
spark.dse.access.namenodesparameter when using DSE Spark commands.The
spark.dse.access.namenodesparameters have the same effect asspark.yarn.access.namenodesfrom stand-alone Spark.The Spark application must have access to the nodes and Kerberos must be properly configured to be able to access them. They must either be in the same realm or in a trusted realm.
DSE Spark acquires security tokens for each of the nodes so the Spark application can access those remote HDFS or CFS clusters.
dse spark --conf spark.dse.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070 -
Pass a comma separate list of HDFS or CFS root directories with the
spark.yarn.access.namenodesparameter when using stand-alone Spark commands.spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070