Starting Spark
How you start Spark depends on the installation and if want to run in Hadoop mode or SearchAnalytics mode:
How you start Spark depends on the installation and if you want to run in Hadoop mode or SearchAnalytics mode:
- Installer-Services and Package installations
- To start the Spark trackers on a cluster of analytics nodes, edit the
/etc/default/dse file to set SPARK_ENABLED to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node.
To start a node in Spark and Hadoop mode, edit the /etc/default/dse file to set HADOOP_ENABLED and SPARK_ENABLED to 1.
Spark and Hadoop mode should be used only for development purposes.
To start a node in SearchAnalytics mode, edit the /etc/default/dse file to set SPARK_ENABLED and SEARCH_ENABLED to 1.
SearchAnalytics mode is experimental, and not recommended for production clusters.
- Installer-No Services and Tarball installations:
- To start the Spark trackers on a cluster of analytics nodes, use the -k
option:
$ dse cassandra -k
To start a node in Spark and Hadoop mode, use the -k and -t options:$ dse cassandra -k -t
Spark and Hadoop mode should only be used for development purposes.
Nodes started with -t or -k are automatically assigned to the default Analytics data center if you do not configure a data center in the snitch property file.
To start a node in SearchAnalytics mode, use the -k and -s options.
$ dse cassandra -k -s
SearchAnalytics mode is experimental, and not recommended for production clusters.
Starting the node with the Spark or Hadoop option starts a node that is designated as the Job Tracker, as shown by the Analytics(JT) workload in the output of the dsetool ring command:
$ dsetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Workload Status State Load Owns Token 10.160.137.165 Analytics rack1 Analytics(JT) Up Normal 87.04 KB 33.33% -9223372036854775808 10.168.193.41 Analytics rack1 Analytics(TT) Up Normal 92.91 KB 33.33% -3074457345618258603 10.176.83.32 Analytics rack1 Analytics(TT) Up Normal 94.9 KB 33.33% 3074457345618258602
Package installations | /usr/bin/dsetool |
Installer-Services installations | /usr/bin/dsetool |
Installer-No Services and Tarball installations | install_location/bin/dsetool |
$ sudo rm -r ~/.spark
Launching Spark
After starting a Spark node, use dse commands to launch Spark.
Package installations | /usr/bin/dse |
Installer-Services installations | /usr/bin/dse |
Installer-No Services and Tarball installations | install_location/bin/dse |
You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified in cassandra.yaml.
Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:
- dse spark
- Enters interactive Spark shell, offers basic
autocompletion.
$ dse spark
- dse spark-submit
- Launches applications on a cluster like spark-submit. Replaces the deprecated dse
spark-class command. Using this interface you can use Spark cluster managers
without the need for separate configurations for each application. The syntax
is:
$ dse spark-submit --class <class name> <jar file> <other_options>
For example, if you write a class that defines an option named d, enter the command as follows:$ dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
To use a user name and password to run an application, use the following syntax:
$ dse -u <username> -p <password> spark[-submit]