Starting Spark

How you start Spark depends on the installation and if want to run in Hadoop mode or SearchAnalytics mode:

How you start Spark depends on the installation and if you want to run in Hadoop mode or SearchAnalytics mode:

Installer-Services and Package installations
To start the Spark trackers on a cluster of analytics nodes, edit the /etc/default/dse file to set SPARK_ENABLED to 1.

When you start DataStax Enterprise as a service, the node is launched as a Spark node.

To start a node in Spark and Hadoop mode, edit the /etc/default/dse file to set HADOOP_ENABLED and SPARK_ENABLED to 1.

Spark and Hadoop mode should be used only for development purposes.

To start a node in SearchAnalytics mode, edit the /etc/default/dse file to set SPARK_ENABLED and SEARCH_ENABLED to 1.

SearchAnalytics mode is experimental, and not recommended for production clusters.

Installer-No Services and Tarball installations:
To start the Spark trackers on a cluster of analytics nodes, use the -k option:
$ dse cassandra -k
To start a node in Spark and Hadoop mode, use the -k and -t options:
$ dse cassandra -k -t

Spark and Hadoop mode should only be used for development purposes.

Nodes started with -t or -k are automatically assigned to the default Analytics data center if you do not configure a data center in the snitch property file.

To start a node in SearchAnalytics mode, use the -k and -s options.

$ dse cassandra -k -s

SearchAnalytics mode is experimental, and not recommended for production clusters.

Starting the node with the Spark or Hadoop option starts a node that is designated as the Job Tracker, as shown by the Analytics(JT) workload in the output of the dsetool ring command:

$ dsetool ring

Note: Ownership information does not include topology, please specify a keyspace. 
Address          DC           Rack   Workload      Status  State    Load      Owns   Token                       
10.160.137.165   Analytics    rack1  Analytics(JT)    Up   Normal   87.04 KB  33.33% -9223372036854775808                        
10.168.193.41    Analytics    rack1  Analytics(TT)    Up   Normal   92.91 KB  33.33% -3074457345618258603                        
10.176.83.32     Analytics    rack1  Analytics(TT)    Up   Normal   94.9 KB   33.33% 3074457345618258602
The default location of the dsetool command depends on the type of installation:
Package installations /usr/bin/dsetool
Installer-Services installations /usr/bin/dsetool
Installer-No Services and Tarball installations install_location/bin/dsetool
If you use sudo to start DataStax Enterprise, remove the ~./spark directory before you restart the cluster :
$ sudo rm -r ~/.spark

Launching Spark 

After starting a Spark node, use dse commands to launch Spark.

The default location of the dse tool depends on the type of installation:
Package installations /usr/bin/dse
Installer-Services installations /usr/bin/dse
Installer-No Services and Tarball installations install_location/bin/dse

You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified in cassandra.yaml.

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:

dse spark
Enters interactive Spark shell, offers basic autocompletion.
$ dse spark 
dse spark-submit
Launches applications on a cluster like spark-submit. Replaces the deprecated dse spark-class command. Using this interface you can use Spark cluster managers without the need for separate configurations for each application. The syntax is:
$ dse spark-submit --class <class name> <jar file> <other_options>
For example, if you write a class that defines an option named d, enter the command as follows:
$ dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
Note: The directory in which you run the dse Spark commands must be writable by the current user.

To use a user name and password to run an application, use the following syntax:

$ dse -u <username> -p <password> spark[-submit]