Accessing HDFS or CFS resources using Kerberos authentication

HDFS or CFS resources can be accessed from BYOS nodes using Kerberos authentication.

If you are using Kerberos authentication and need to access HDFS or CFS data from BYOS nodes, follow these steps to configure DSE and Spark.

Procedure

  1. Copy hdfs-site.xml from your Hadoop configuration directory to the DSE Hadoop configuration directory
    The default location of the Hadoop configuration files depends on the type of installation:
    Installer-Services and Package installations /etc/dse/hadoop/conf

    /etc/dse/resources/hadoop2-client/conf

    Installer-No Services and Tarball installations install_location/resources/hadoop/conf/

    install_location/resources/hadoop2-client/conf

    scp hdfs-site.xml admin@dsenode:/etc/dse/hadoop2-client/conf/
  2. Pass a comma separate list of HDFS or CFS root directories with the spark.dse.access.namenodes parameter when using DSE Spark commands.

    The spark.dse.access.namenodes parameters have the same effect as spark.yarn.access.namenodes from stand-alone Spark.

    The Spark application must have access to the nodes and Kerberos must be properly configured to be able to access them. They must either be in the same realm or in a trusted realm.

    DSE Spark acquires security tokens for each of the nodes so the Spark application can access those remote HDFS or CFS clusters.

    dse spark --conf spark.dse.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070
  3. Pass a comma separate list of HDFS or CFS root directories with the spark.yarn.access.namenodes parameter when using stand-alone Spark commands.
    spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/,hdfs://node2/,webhdfs://node3:50070