dse spark-submit

Launches applications on a cluster to enable use of Spark cluster managers through a uniform interface.

Launches applications on a cluster to enable use of Spark cluster managers through a uniform interface. This command supports the same options as Apache Spark spark-submit.

Restriction: Command is supported only on nodes with analytics workloads.

Synopsis

dse spark-submit 
--class class_name 
jar_file other_options|
--status|--kill driver_id [--master master_ip_address]
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a valid option or user-defined value.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the Spark status and kill options, in DSE deployments these options do not require the Spark Master IP address.

kill driver_id
Kill a Spark application running in the DSE cluster.
master master_ip_address
The IP address of the Spark Master running in the DSE cluster.
status driver_id
Get the status of a Spark application running in the DSE cluster.

Examples

Run the HTTP response example program (located in the dse-demos directory) on two nodes:
dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d 2

To submit an application using cluster mode using the supervise option to restart in case of failure

dse spark-submit --deploy-mode cluster --supervise --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES

To submit an application using cluster mode when TLS is enabled

Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.

dse spark-submit \
--conf spark.ssl.ui.enabled=true \
--conf spark.ssl.ui.keyPassword=keystore password \
--conf spark.ssl.ui.keyStore=path to keystore \
myApplication.jar

To set the driver host to a publicly accessible IP address

dse spark-submit --conf spark.driver.host=203.0.113.0 myApplication.jar

To get the status of a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --status driver-20180726160353-0019
Result when the driver exists:
Driver driver-20180726160353-0019 found: state=<state>, worker=<workerId> (<workerHostPort>)

To kill a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --kill driver-20180726160353-0019