dse spark-submit

Launches applications on a cluster to enable use of Spark cluster managers through a uniform interface. This command supports the same options as Apache Spark spark-submit.

Restriction: Command is supported only on nodes with analytics workloads.

Synopsis

dse spark-submit 
--class class_name 
jar_file other_options|
--status|--kill driver_id [--master master_ip_address]

Table 1. Legend
Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`Italics`	Variable value. Replace with a valid option or user-defined value.
`[ ]`	Optional. Square brackets ( `[ ]` ) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar ( `\|` ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'Literal string'`	Single quotation ( `'` ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ key:value }`	Map collection. Braces ( `{ }` ) enclose map collections or key value pairs. A colon separates the key and the value.
`<datatype1,datatype2>`	Set, list, map, or tuple. Angle brackets ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`cql_statement;`	End CQL statement. A semicolon ( `;` ) terminates all CQL statements.
`[ -- ]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <schema> ... </schema> '`	Search CQL only: Single quotation marks ( `'` ) surround an entire XML schema declaration.
`@xml_entity='xml_entity_type'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the Spark status and kill options, in DSE deployments these options do not require the Spark Master IP address.

kill driver_id: Kill a Spark application running in the DSE cluster.
master master_ip_address: The IP address of the Spark Master running in the DSE cluster.
status driver_id: Get the status of a Spark application running in the DSE cluster.

Examples

Run the HTTP response example program (located in the dse-demos directory) on two nodes:

dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d 2

To submit an application using cluster mode using the supervise option to restart in case of failure

dse spark-submit --deploy-mode cluster --supervise --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES

To submit an application using cluster mode when TLS is enabled

Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.

dse spark-submit \
--conf spark.ssl.ui.enabled=true \
--conf spark.ssl.ui.keyPassword=keystore password \
--conf spark.ssl.ui.keyStore=path to keystore \
myApplication.jar

To set the driver host to a publicly accessible IP address

dse spark-submit --conf spark.driver.host=203.0.113.0 myApplication.jar

To get the status of a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --status driver-20180726160353-0019

Result when the driver exists:

Driver driver-20180726160353-0019 found: state=<state>, worker=<workerId> (<workerHostPort>)

To kill a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --kill driver-20180726160353-0019