Starting Spark
How you start Spark depends on the installation and if want to run in Spark mode, Spark and Hadoop mode, or SearchAnalytics mode:
How you start Spark depends on the installation and if you want to run in Spark mode, Spark and Hadoop mode, or SearchAnalytics:
- Installer-Services and Package installations
- To start the Spark trackers on a cluster of analytics nodes, edit the
/etc/default/dse file to set SPARK_ENABLED to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node. You can enable additional components.
Mode Option in /etc/default/dse Description Spark SPARK_ENABLED=1 Start the node in Spark mode. SearchAnalytics mode SPARK_ENABLED=1
SEARCH_ENABLED=1
In dse.yaml, cql_solr_query_paging: driver is required. Spark and Hadoop mode SPARK_ENABLED=1
HADOOP_ENABLED=1
Spark and Hadoop mode should be used only for development purposes. - Installer-No Services and Tarball installations:
- To start the Spark trackers on a cluster of analytics nodes, use the -k option:
dse cassandra -k
Note:You can enable additional components:Nodes started with -t or -k are automatically assigned to the default Analytics datacenter if you do not configure a datacenter in the snitch property file.
For example:Mode Option Description Spark -k Start the node in Spark mode. SearchAnalytics mode -k -s
In dse.yaml, cql_solr_query_paging: driver is required. Spark and Hadoop mode -k -t
Spark and Hadoop mode should be used only for development purposes. To start a node in SearchAnalytics mode, use the -k -s options.
dse cassandra -k -s
SearchAnalytics mode is experimental, and is not recommended for production clusters.
To start a node in Spark and Hadoop mode, use the -k -t options:dse cassandra -k -t
Spark and Hadoop mode should only be used for development purposes.
Starting the node with the Spark or Hadoop option starts a node that
is designated as the Job Tracker, as shown by the Analytics(JT) workload in the output of the
dsetool ring
command:
dsetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Workload Status State Load Owns Token 10.160.137.165 Analytics rack1 Analytics(JT) Up Normal 87.04 KB 33.33% -9223372036854775808 10.168.193.41 Analytics rack1 Analytics(TT) Up Normal 92.91 KB 33.33% -3074457345618258603 10.176.83.32 Analytics rack1 Analytics(TT) Up Normal 94.9 KB 33.33% 3074457345618258602
dsetool
command depends on the type of
installation:Package installations | /usr/bin/dsetool |
Installer-Services installations | /usr/bin/dsetool |
Installer-No Services and Tarball installations | install_location/bin/dsetool |
sudo
to start DataStax Enterprise, remove the
~./spark directory before you restart the cluster
:sudo rm -r ~/.spark
Launching Spark
After starting a Spark node, use dse
commands to launch Spark.
dse
tool depends
on the type of installation:Package installations | /usr/bin/dse |
Installer-Services installations | /usr/bin/dse |
Installer-No Services and Tarball installations | install_location/bin/dse |
You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified in cassandra.yaml.
Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:
- dse spark
- Enters interactive Spark shell, offers basic
autocompletion.
dse spark
- dse spark-submit
- Launches applications on a cluster like spark-submit. Replaces the deprecated
dse spark-class
command. Using this interface you can use Spark cluster managers without the need for separate configurations for each application. The syntax is:dse spark-submit --class class_name jar_file other_options
For example, if you write a class that defines an option named d, enter the command as follows:dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
dse
Spark commands must be writable by the current user.Internal authentication is supported.
DSE_USERNAME=user DSE_PASSWORD=secret dse spark[-submit]These environment variables are supported for all Spark commands.
$ dse [-f config_file] [-u username -p password] [-a jmx_username -b jmx_password] spark[-submit]
-f config_file
is the path to a configuration file that stores credentials. If not specified, then use ~/.dserc if it exists.The configuration file can contain Cassandra and JMX login credentials. For example:
The credentials in the configuration file are stored in clear text. DataStax recommends restricting access to this file only to the specific user.username=cassandra password=cassandra jmx_username=cassandra jmx_password=jmx
--ssl
enables SSL encryption.dse -u username
is the user name to authenticate against the configured Cassandra user.dsetool -l username
is the user name to authenticate against the configured Cassandra user.-p password
is the password to authenticate against the configured Cassandra user. If you do not provide a password on the command line, you are prompted to enter one.-a jmx_username
is the user name for authenticating with secure JMX.-b jmx_username
is the password for authenticating with secure JMX. If you do not provide a password on the command line, you are prompted to enter one.
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |