dse spark-submit

Launches applications on a cluster to enable use of Apache Spark™ cluster managers through a uniform interface. This command supports the same options as Apache Spark spark-submit.

This command is supported only on nodes with analytics workloads.

Synopsis

dse spark-submit
--class <class_name>
<jar_file> <other_options>|
--status|--kill <driver_id> [--master <master_ip_address>]
Syntax legend
Syntax conventions Description

Italic, bold, or < >

Syntax diagrams and code samples use one or more of these styles to mark placeholders for variable values. Replace placeholders with a valid option or your own user-defined value.

In CQL statements, angle brackets are required to enclose data types in a set, list, map, or tuple. Separate the data types with a comma. For example: <datatype2

In Search CQL statements, use angle brackets to identify the entity and literal value to overwrite the XML element in the schema and solrconfig files, such as @<xml_entity>='<xml_entity_type>'.

[ ]

Square brackets surround optional command arguments. Do not type the square brackets.

( )

Parentheses identify a group to choose from. Do not type the parentheses.

|

A pipe separates alternative elements. Type any one of the elements. Do not type the pipe.

...

Indicates that you can repeat the syntax element as often as required.

'

Use single quotation marks to surround literal strings in CQL statements. Use single quotation marks to preserve upper case. + For Search CQL only: Single quotation marks surround an entire XML schema declaration, such as '<<schema> ... </schema>>'

{ }

Map collection. Curly braces enclose maps ({ <key_datatype>:<value_datatype> }) or key value pairs ({ <key>:<value> }). A colon separates the key and the value.

;

Ends a CQL statement.

--

Separate command line options from command arguments with two hyphens. This syntax is useful when arguments might be mistaken for command line options.

This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the Apache Spark™ status and kill options, in DSE deployments these options do not require the Spark Master IP address.

kill driver_id

Kill a Spark application running in the DSE cluster.

master master_ip_address

The IP address of the Spark Master running in the DSE cluster.

status driver_id

Get the status of a Spark application running in the DSE cluster.

Examples

Run the HTTP response example program (located in the dse-demos directory) on two nodes:

dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d 2

To submit an application using cluster mode using the supervise option to restart in case of failure

dse spark-submit --deploy-mode cluster --supervise --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES

To submit an application using cluster mode when TLS is enabled

Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.

dse spark-submit \
--conf spark.ssl.ui.enabled=true \
--conf spark.ssl.ui.keyPassword=keystore password \
--conf spark.ssl.ui.keyStore=path to keystore \
myApplication.jar

To set the driver host to a publicly accessible IP address

dse spark-submit --conf spark.driver.host=203.0.113.0 myApplication.jar

To get the status of a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --status driver-20180726160353-0019

Result when the driver exists:

Driver driver-20180726160353-0019 found: state=<state>, worker=<workerId> (<workerHostPort>)

To kill a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

dse spark-submit --kill driver-20180726160353-0019

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM