Running Spark commands against a remote cluster

To run Spark commands against a remote cluster, you must copy your Hadoop configuration files from one of the remote nodes to the local client machine.

To run Spark commands against a remote cluster, you must copy your Hadoop configuration files from one of the remote nodes to the local client machine.

The default location of the Hadoop configuration files depends on the type of installation:
Installer-Services and Package installations /etc/dse/hadoop/
Installer-No Services and Tarball installations install_location/resources/hadoop/conf/

To run a driver application remotely, there must be full public network communication between the remote nodes and the client machine.

Procedure

  1. Copy the files from the remote node to the local machine.

    On a services or package install of DataStax Enterprise:

    $ cd /etc/dse/hadoop
    $ scp adminuser@node1:/etc/dse/hadoop/* .
  2. Optional: Edit the copied XML configuration files to ensure that the IP address for the Cassandra nodes is a publicly accessible IP address.
  3. Run the Spark command against the remote node.
    $ dse spark-submit submit options myApplication.jar

    To set the driver host to a publicly accessible IP address, pass in the spark.driver.host option.

    $ dse spark-submit --conf spark.driver.host=IP address myApplication.jar