Accessing HDFS or CFS resources using Kerberos authentication

If you are using Kerberos authentication and need to access HDFS or CFS data from BYOS nodes, follow these steps to configure DSE and Spark.

Procedure

Copy hdfs-site.xml from your Hadoop configuration directory to the DSE Hadoop configuration directory
```
scp hdfs-site.xml admin@dsenode:/etc/dse/hadoop2-client/conf/
```
Pass a comma separate list of HDFS or CFS root directories with the spark.dse.access.namenodes parameter when using DSE Spark commands.

The spark.dse.access.namenodes parameters have the same effect as spark.yarn.access.namenodes from stand-alone Spark.

The Spark application must have access to the nodes and Kerberos must be properly configured to be able to access them. They must either be in the same realm or in a trusted realm.

DSE Spark acquires security tokens for each of the nodes so the Spark application can access those remote HDFS or CFS clusters.
```
dse spark --conf spark.dse.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070
```

Pass a comma separate list of HDFS or CFS root directories with the spark.yarn.access.namenodes parameter when using stand-alone Spark commands.

spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070

Accessing HDFS or CFS resources using Kerberos authentication

Procedure

Was this helpful?

Give Feedback