Starting Spark

How you start Spark depends on the installation and if want to run in Spark mode or SearchAnalytics mode:

Before you start Spark, set RPC permissions for the DseClientTool object.
Note: RPC permission for the DseClientTool object is required to run Spark because the DseClientTool object is called implicitly by the Spark launcher.

How you start Spark depends on the installation and if you want to run in Spark mode or SearchAnalytics mode:

Installer-Services and Package installations
To start the Spark trackers on a cluster of analytics nodes, edit the /etc/default/dse file to set SPARK_ENABLED to 1.

When you start DataStax Enterprise as a service, the node is launched as a Spark node. You can enable additional components.

Mode Option in /etc/default/dse Description
Spark SPARK_ENABLED=1 Start the node in Spark mode.
SearchAnalytics mode

SPARK_ENABLED=1
SEARCH_ENABLED=1

SearchAnalytics mode requires testing in your environment before it is used in production clusters. In dse.yaml, cql_solr_query_paging: driver is required.
Installer-No Services and Tarball installations:
To start the Spark trackers on a cluster of analytics nodes, use the -k option:
dse cassandra -k
Note:

Nodes started with -k are automatically assigned to the default Analytics datacenter if you do not configure a datacenter in the snitch property file.

You can enable additional components:
Mode Option Description
Spark -k Start the node in Spark mode.
SearchAnalytics mode

-k -s

In dse.yaml, cql_solr_query_paging: driver is required.
For example:

To start a node in SearchAnalytics mode, use the -k -s options.

dse cassandra -k -s

SearchAnalytics mode is experimental, and is not recommended for production clusters.

Starting the node with the Spark option starts a node that is designated as the Job Tracker, as shown by the Analytics(JT) workload in the output of the dsetool ring command:

dsetool ring

Note: Ownership information does not include topology, please specify a keyspace. 
Address          DC           Rack   Workload      Status  State    Load      Owns   Token                       
10.160.137.165   Analytics    rack1  Analytics(JT)    Up   Normal   87.04 KB  33.33% -9223372036854775808                        
10.168.193.41    Analytics    rack1  Analytics(TT)    Up   Normal   92.91 KB  33.33% -3074457345618258603                        
10.176.83.32     Analytics    rack1  Analytics(TT)    Up   Normal   94.9 KB   33.33% 3074457345618258602
The default location of the dsetool command depends on the type of installation:
Package installations /usr/bin/dsetool
Installer-Services installations /usr/bin/dsetool
Installer-No Services and Tarball installations install_location/bin/dsetool
If you use sudo to start DataStax Enterprise, remove the ~./spark directory before you restart the cluster :
sudo rm -r ~/.spark

Launching Spark 

After starting a Spark node, use dse commands to launch Spark.

The default location of the dse tool depends on the type of installation:
Package installations /usr/bin/dse
Installer-Services installations /usr/bin/dse
Installer-No Services and Tarball installations install_location/bin/dse

You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified in cassandra.yaml.

The location of the cassandra.yaml file depends on the type of installation:
Installer-Services /etc/dse/cassandra/cassandra.yaml
Package installations /etc/dse/cassandra/cassandra.yaml
Installer-No Services install_location/resources/cassandra/conf/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:

dse spark
Enters interactive Spark shell, offers basic autocompletion.
dse spark 
dse spark-submit
Launches applications on a cluster like spark-submit. Replaces the deprecated dse spark-class command. Using this interface you can use Spark cluster managers without the need for separate configurations for each application. The syntax is:
dse spark-submit --class class_name jar_file other_options
For example, if you write a class that defines an option named d, enter the command as follows:
dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
Note: The JAR file can be located in a DSEFS directory. If the DSEFS cluster is secured, provide authentication credentials as described in DSEFS authentication.
Note: The directory in which you run the dse Spark commands must be writable by the current user.

Internal authentication is supported.

Use the optional environment variables DSE_USERNAME and DSE_PASSWORD to increase security and prevent the user name and passwords from appearing in the Spark log files or in the process list on the Spark Web UI. To specify a user name and password using environment variables, add the following to your Bash .profile or .bash_profile:
export DSE_USERNAME=user
export DSE_PASSWORD=secret
These environment variables are supported for all Spark and dse client-tool commands.
Note: DataStax recommends using the environment variables instead of passing user credentials on the command line.

You can provide authentication credentials in several ways, see Credentials for authentication.

The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml