Running Spark commands against a remote cluster

To run Spark commands against a remote cluster, you must export the DSE configuration from one of the remote nodes to the local client machine.

To run Spark commands against a remote cluster, you must export the DSE configuration from one of the remote nodes to the local client machine.

The default location of the Hadoop configuration files depends on the type of installation:
Installer-Services and Package installations /etc/dse/hadoop/conf

/etc/dse/resources/hadoop2-client/conf

Installer-No Services and Tarball installations install_location/resources/hadoop/conf/

install_location/resources/hadoop2-client/conf

To run a driver application remotely, there must be full public network communication between the remote nodes and the client machine.

Procedure

  1. Export the DataStax Enterprise client configuration from the remote node to the client node:
    1. On the remote node:
      dse client-tool configuration export dse-config.jar
    2. Copy the exported JAR to the client nodes.
      scp dse-config.jar user@clientnode1.example.com:
    3. On the client node:
      dse client-tool configuration import dse-config.jar
  2. Run the Spark command against the remote node.
    dse spark-submit submit options myApplication.jar

    To set the driver host to a publicly accessible IP address, pass in the spark.driver.host option.

    dse spark-submit --conf spark.driver.host=IP address myApplication.jar