Accessing HDFS or CFS resources using Kerberos authentication
If you are using Kerberos authentication and need to access HDFS or CFS data from BYOS nodes, follow these steps to configure DSE and Spark.
Procedure
-
Copy
hdfs-site.xml
from your Hadoop configuration directory to the DSE Hadoop configuration directoryscp hdfs-site.xml admin@dsenode:/etc/dse/hadoop2-client/conf/
-
Pass a comma separate list of HDFS or CFS root directories with the
spark.dse.access.namenodes
parameter when using DSE Spark commands.The
spark.dse.access.namenodes
parameters have the same effect asspark.yarn.access.namenodes
from stand-alone Spark.The Spark application must have access to the nodes and Kerberos must be properly configured to be able to access them. They must either be in the same realm or in a trusted realm.
DSE Spark acquires security tokens for each of the nodes so the Spark application can access those remote HDFS or CFS clusters.
dse spark --conf spark.dse.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070
-
Pass a comma separate list of HDFS or CFS root directories with the
spark.yarn.access.namenodes
parameter when using stand-alone Spark commands.spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070