dse spark

Enters an interactive Apache Spark™ shell and offers basic auto-completion.

Restriction: Command is supported only on nodes with analytics workloads.

For details on using Apache Spark™ with DSE, see:

Synopsis

dse <connection_options> spark
[-framework dse|spark-2.0] [--help] [--verbose]
[--conf name=spark.value|<sparkproperties.conf>]
[--executor-memory <mem>]
[--jars <additional-jars>]
[--master dse://?appReconnectionTimeoutSeconds=<secs>]
[--properties-file <path_to_properties_file>]
[--total-executor-cores <cores>]
[-i <app_script_file>]

Syntax conventions Description

Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`<Italics>`	Variable value. Replace with a valid option or user-defined value.
`[ ]`	Optional. Square brackets ( `[ ]` ) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar ( `\|` ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'<Literal string>'`	Single quotation ( `'` ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ <key>:<value> }`	Map collection. Braces ( `{ }` ) enclose map collections or key value pairs. A colon separates the key and the value.
`<<datatype1>,<datatype2>>`	Set, list, map, or tuple. Angle brackets ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`cql_statement;`	End CQL statement. A semicolon ( `;` ) terminates all CQL statements.
`[ -- ]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <<schema> ... </schema> >'`	Search CQL only: Single quotation marks ( `'` ) surround an entire XML schema declaration.
`@<xml_entity>='<xml_entity_type>'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

UPPERCASE

Literal keyword.

Lowercase

Not literal.

<Italics>

Variable value. Replace with a valid option or user-defined value.

[ ]

Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key>:<value> }

Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.

<<datatype1>,<datatype2>>

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

cql_statement;

End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema> ... </schema> >'

Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

In general, Spark submission arguments (--<submission_args>) are translated into system properties -Dname=value and other VM parameters like classpath. The application arguments (-<app_args>) are passed directly to the application.

Configure the Spark shell with these arguments:

--conf name=spark.value|sparkproperties.conf

An arbitrary Spark option to the Spark configuration prefixed by `spark.

name-spark.value
sparkproperties.conf - a configuration

--executor-memory mem: The amount of memory that each executor can consume for the application. Apache Spark uses a 512 MB default. Specify the memory argument in JVM format using the k, m, or g suffix.`

-framework dse|spark-2.0

The classpath for the Spark shell. When not set, the default is dse.

dse - Sets the Spark classpath to the same classpath that is used by the DataStax Enterprise (DSE) server.

spark-2.0 - Sets a classpath that is used by the open source Apache Spark (OSS) 2.0 release to accommodate applications originally written for open source Apache Spark. Uses a BYOS (Bring Your Own Spark) JAR with shaded references to internal dependencies to eliminate complexity when porting an app from OSS Apache Spark.

If the code works on DSE, applications do not require the spark-2.0 framework. Full support in the spark-2.0 framework might require specifying additional dependencies. For example: hadoop-aws is included on the dse server path but is not present on the OSS Apache Spark-2.0 classpath. In this example, applications that use S3 or other AWS APIs must include their own aws-sdk on the runtime classpath. This additional runtime classpath is required only for applications that cannot run on the DSE classpath.

--help: Shows a help message that displays all options except DSE Spark shell options.

-i app_script_file: Spark shell application argument that runs a script from the specified file.

--jars path_to_additional_jars: A comma-separated list of paths to additional JAR files.
--master dse://?appReconnectionTimeoutSeconds=secs: A custom timeout value when submitting the application, useful for troubleshooting Spark application failures. The default timeout value is 5 seconds.

--properties-file path_to_properties_file: The location of the properties file that has the configuration settings. By default, Spark loads the settings from spark-defaults.conf.

--total-executor-cores cores: The total number of cores the application uses.

--verbose: Displays which arguments are recognized as Spark configuration options and which arguments are forwarded to the Spark shell.

Examples

Start the Apache Spark shell

dse spark

Start the Apache Spark shell with case-sensitivity

DseGraphFrame and Spark SQL are case insensitive by default. Column names that differ only in case will result in conflicts. The Spark property spark.sql.caseSensitive=true avoids case conflicts.

dse spark --conf spark.sql.caseSensitive=true

Set the timeout value to 10 seconds

dse spark --master dse://?appReconnectionTimeoutSeconds=10

Useful for troubleshooting, see Detecting Apache Spark application failures.