Starting Spark
How you start Spark depends on the installation and if want to run in Spark mode or SearchAnalytics mode:
DseClientTool
object. DseClientTool
object is required to run
Spark because the DseClientTool
object is called implicitly by the Spark
launcher.How you start Spark depends on the installation and if you want to run in Spark mode or SearchAnalytics mode:
- Installer-Services and Package installations
- To start the Spark trackers on a cluster of analytics nodes, edit the
/etc/default/dse file to set SPARK_ENABLED to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node. You can enable additional components.
Mode Option in /etc/default/dse Description Spark SPARK_ENABLED=1 Start the node in Spark mode. SearchAnalytics mode SPARK_ENABLED=1
SEARCH_ENABLED=1
SearchAnalytics mode requires testing in your environment before it is used in production clusters. In dse.yaml, cql_solr_query_paging: driver is required. - Installer-No Services and Tarball installations:
- To start the Spark trackers on a cluster of analytics nodes, use the -k option:
dse cassandra -k
Note:You can enable additional components:Nodes started with -k are automatically assigned to the default Analytics datacenter if you do not configure a datacenter in the snitch property file.
For example:Mode Option Description Spark -k Start the node in Spark mode. SearchAnalytics mode -k -s
In dse.yaml, cql_solr_query_paging: driver is required. To start a node in SearchAnalytics mode, use the -k -s options.
dse cassandra -k -s
SearchAnalytics mode is experimental, and is not recommended for production clusters.
Starting the node with the Spark option starts a node that is
designated as the Job Tracker, as shown by the Analytics(JT) workload in the output of the
dsetool ring
command:
dsetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Workload Status State Load Owns Token 10.160.137.165 Analytics rack1 Analytics(JT) Up Normal 87.04 KB 33.33% -9223372036854775808 10.168.193.41 Analytics rack1 Analytics(TT) Up Normal 92.91 KB 33.33% -3074457345618258603 10.176.83.32 Analytics rack1 Analytics(TT) Up Normal 94.9 KB 33.33% 3074457345618258602
dsetool
command depends on the type
of installation:Package installations | /usr/bin/dsetool |
Installer-Services installations | /usr/bin/dsetool |
Installer-No Services and Tarball installations | install_location/bin/dsetool |
sudo
to start DataStax Enterprise, remove the
~./spark directory before you restart the cluster
:sudo rm -r ~/.spark
Launching Spark
After starting a Spark node, use dse
commands to launch Spark.
dse
tool
depends on the type of installation:Package installations | /usr/bin/dse |
Installer-Services installations | /usr/bin/dse |
Installer-No Services and Tarball installations | install_location/bin/dse |
You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified in cassandra.yaml.
Installer-Services | /etc/dse/cassandra/cassandra.yaml |
Package installations | /etc/dse/cassandra/cassandra.yaml |
Installer-No Services | install_location/resources/cassandra/conf/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:
- dse spark
- Enters interactive Spark shell, offers basic
autocompletion.
dse spark
- dse spark-submit
- Launches applications on a cluster like spark-submit. Replaces the deprecated
dse spark-class
command. Using this interface you can use Spark cluster managers without the need for separate configurations for each application. The syntax is:dse spark-submit --class class_name jar_file other_options
For example, if you write a class that defines an option named d, enter the command as follows:dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
Note: The JAR file can be located in a DSEFS directory. If the DSEFS cluster is secured, provide authentication credentials as described in DSEFS authentication.
dse
Spark commands must be writable by the current user.Internal authentication is supported.
DSE_USERNAME
and
DSE_PASSWORD
to increase security and prevent the user name and passwords
from appearing in the Spark log files or in the process list on the Spark Web UI. To specify
a user name and password using environment variables, add the following to your Bash
.profile or .bash_profile:
export DSE_USERNAME=user export DSE_PASSWORD=secretThese environment variables are supported for all Spark and dse client-tool commands.
You can provide authentication credentials in several ways, see Credentials for authentication.
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |