dse spark-submit
Launches applications on a cluster to enable use of Apache Spark™ cluster managers through a uniform interface. This command supports the same options as Apache Spark spark-submit.
This command is supported only on nodes with analytics workloads.
Synopsis
dse spark-submit
--class <class_name>
<jar_file> <other_options>|
--status|--kill <driver_id> [--master <master_ip_address>]
Syntax legend
| Syntax conventions | Description |
|---|---|
Italic, bold, or |
Syntax diagrams and code samples use one or more of these styles to mark placeholders for variable values. Replace placeholders with a valid option or your own user-defined value. In CQL statements, angle brackets are required to enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
For example: In Search CQL statements, use angle brackets to identify the entity and literal value to overwrite the XML element in the schema and |
|
Square brackets surround optional command arguments. Do not type the square brackets. |
|
Parentheses identify a group to choose from. Do not type the parentheses. |
|
A pipe separates alternative elements. Type any one of the elements. Do not type the pipe. |
|
Indicates that you can repeat the syntax element as often as required. |
|
Use single quotation marks to surround literal strings in CQL statements.
Use single quotation marks to preserve upper case.
+
For Search CQL only: Single quotation marks surround an entire XML schema declaration, such as |
|
Map collection.
Curly braces enclose maps ( |
|
Ends a CQL statement. |
|
Separate command line options from command arguments with two hyphens. This syntax is useful when arguments might be mistaken for command line options. |
This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the Apache Spark™ status and kill options, in DSE deployments these options do not require the Spark Master IP address.
- kill driver_id
-
Kill a Spark application running in the DSE cluster.
- master master_ip_address
-
The IP address of the Spark Master running in the DSE cluster.
- status driver_id
-
Get the status of a Spark application running in the DSE cluster.
Examples
Run the HTTP response example program (located in the dse-demos directory) on two nodes:
dse spark-submit --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d 2
To submit an application using cluster mode using the supervise option to restart in case of failure
dse spark-submit --deploy-mode cluster --supervise --class com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES
To submit an application using cluster mode when TLS is enabled
Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.
dse spark-submit \
--conf spark.ssl.ui.enabled=true \
--conf spark.ssl.ui.keyPassword=keystore password \
--conf spark.ssl.ui.keyStore=path to keystore \
myApplication.jar
To set the driver host to a publicly accessible IP address
dse spark-submit --conf spark.driver.host=203.0.113.0 myApplication.jar
To get the status of a driver
Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.
dse spark-submit --status driver-20180726160353-0019
Result when the driver exists:
Driver driver-20180726160353-0019 found: state=<state>, worker=<workerId> (<workerHostPort>)
To kill a driver
Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.
dse spark-submit --kill driver-20180726160353-0019