Starting Spark
How you start Spark depends on the installation and if want to run in Hadoop mode:
How you start Spark depends on the installation and if want to run in Hadoop mode:
- Installer-Services and Package installations: To start the Spark trackers on a cluster
of Analytics nodes, edit the /etc/default/dse file to set SPARK_ENABLED
to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node.
To start a node in Spark and Hadoop mode, edit the /etc/default/dse file to set HADOOP_ENABLED and SPARK_ENABLED to 1.
- Installer-No Services and Tarball installations: To start the Spark trackers on a cluster
of Analytics nodes, use the -k
option:
$ dse cassandra -k
To start a node in Spark and Hadoop mode, use the -k and -t options:$ dse cassandra -k -t
Nodes started with either -t or -k are automatically assigned to the default Analytics data center if you do not configure a data center in the snitch property file.
Starting the node with the Spark or Hadoop options starts a node designated as the job tracker, as shown by the Analytics(JT) workload in the output of the dsetool ring command:
$ dsetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Workload Status State Load Owns Token 10.160.137.165 Analytics rack1 Analytics(JT) Up Normal 87.04 KB 33.33% -9223372036854775808 10.168.193.41 Analytics rack1 Analytics(TT) Up Normal 92.91 KB 33.33% -3074457345618258603 10.176.83.32 Analytics rack1 Analytics(TT) Up Normal 94.9 KB 33.33% 3074457345618258602
$ sudo rm -r ~/.spark
Launching Spark
After starting a Spark node, use dse commands to launch Spark. For example, on Linux from the installation directory use the following syntax:
$ bin/<dse command>
You can use the Cassandra specific properties to start Spark.
DataStax Enterprise supports these commands for launching Spark on the Datastax Enterprise command line:
- dse spark
- Enters interactive Spark shell, offers basic autocompletion.
- dse spark-submit
- Launches applications on a cluster like spark-submit. Replaces the deprecated dse
spark-class command. Using this interface you can use Spark cluster managers
without the need for separate configurations for each application.The syntax
is:
$ dse spark-submit --class <class name> <jar file> <other_options>
For example, if you write a class that defines an option named d, enter the command as follows:$ dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
- dse spark-submit-with-cc
- Launches a Spark program in batch mode and generates the Cassandra context. Replaces the deprecated dse spark-class-with-cc command. You can pass configuration arguments to Spark using this command.
- dse spark-with-cc
- Enters the interactive Spark shell and generates the Cassandra context. This feature is deprecated and might be modified or removed in the future. You can pass configuration arguments to Spark using this command.
- dse spark-schema
- Generate a Cassandra context JAR. This feature is deprecated and might be modified or removed in the future.
To use a user name and password to run an application, use the following syntax:
$ dse -u <username> -p <password> spark[-submit]
Generating a Cassandra context from a file
You can specify the following additional options when using dse spark-schema:
- --force
Force recompile all the sources in Cassandra context.
- --output=...
Path to the output directory where the cassandra context is to be generated, if not specified, SPARK_CASSANDRA_CONTEXT_DIR env variable is used.
- --script=...
Path to cql script; if specified, the context classes are generated from the schema provided in that CQL file rather than from the current schema in Cassandra. Running Cassandra is not required.
Using the dse spark-schema command, you can generate the Cassandra context to a specified directory. You can base the context on a script that contains arbitrary CQL statements and comments. However, only CREATE TABLE and USE statements are processed. Other statements are ignored and generate a warning message.